Flapping Network NICs on Ceph Public Network VLAN

psionic

Member
May 23, 2019
75
10
13
Same port on all 4 nodes, report way longer than able to paste here. This port is used for the Ceph Public Network VLAN...
lsmod | grep -i i40e
i40e 385024 0
root@pve14:~# cat /var/log/messages | grep -i i40e

Jan 2 06:25:54 pve14 kernel: [560724.602777] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 06:25:59 pve14 kernel: [560729.883032] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 06:28:48 pve14 kernel: [560899.428753] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 06:28:54 pve14 kernel: [560904.845987] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 09:21:27 pve14 kernel: [571258.196615] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 09:21:33 pve14 kernel: [571263.697309] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 09:29:25 pve14 kernel: [571735.622712] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 09:29:30 pve14 kernel: [571740.822120] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 10:37:45 pve14 kernel: [575836.036942] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 10:37:50 pve14 kernel: [575840.670088] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 11:10:36 pve14 kernel: [577807.551073] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 11:10:41 pve14 kernel: [577812.509775] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 12:54:53 pve14 kernel: [584064.355090] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 12:54:58 pve14 kernel: [584069.024122] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 12:57:12 pve14 kernel: [584203.393785] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 12:57:23 pve14 kernel: [584214.394146] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 12:57:54 pve14 kernel: [584245.063220] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 12:57:58 pve14 kernel: [584249.594366] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 13:25:59 pve14 kernel: [585929.820363] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 13:26:03 pve14 kernel: [585934.431970] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 13:38:52 pve14 kernel: [586702.826831] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 13:38:56 pve14 kernel: [586707.384546] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 13:46:40 pve14 kernel: [587170.755550] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 13:46:44 pve14 kernel: [587175.267977] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 14:05:10 pve14 kernel: [588280.996974] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 14:05:14 pve14 kernel: [588285.582592] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 14:14:20 pve14 kernel: [588831.248639] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 14:14:25 pve14 kernel: [588835.911664] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 14:23:59 pve14 kernel: [589409.922366] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 14:24:03 pve14 kernel: [589414.691526] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 14:41:28 pve14 kernel: [590459.276575] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 14:41:33 pve14 kernel: [590464.573921] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 14:55:20 pve14 kernel: [591291.399835] i40e 0000:81:00.3 ens1f3: NIC Link is Down
Jan 2 14:55:25 pve14 kernel: [591296.608964] i40e 0000:81:00.3 ens1f3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 15:04:57 pve14 kernel: [591867.726384] i40e 0000:81:00.3 ens1f3: changing MTU from 1500 to 9000
Jan 2 15:04:57 pve14 kernel: [591867.991374] i40e 0000:81:00.2 ens1f2: changing MTU from 1500 to 9000
Jan 2 15:05:37 pve14 kernel: [591908.002967] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 15:05:42 pve14 kernel: [591912.769570] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 15:28:23 pve14 kernel: [593274.712102] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 15:28:28 pve14 kernel: [593279.142737] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 15:32:27 pve14 kernel: [593518.580157] i40e 0000:81:00.1 ens1f1: changing MTU from 1500 to 9000
Jan 2 15:32:28 pve14 kernel: [593518.848346] i40e 0000:81:00.0 ens1f0: changing MTU from 1500 to 9000
Jan 2 16:00:04 pve14 kernel: [595174.911908] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 16:00:14 pve14 kernel: [595185.149774] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 16:01:06 pve14 kernel: [595237.689978] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 16:01:11 pve14 kernel: [595242.363197] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 16:02:39 pve14 kernel: [595330.168774] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 16:02:49 pve14 kernel: [595340.072508] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 16:12:52 pve14 kernel: [595943.057658] i40e 0000:81:00.3 ens1f3: NIC Link is Down
Jan 2 16:12:57 pve14 kernel: [595948.445440] i40e 0000:81:00.3 ens1f3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 16:55:00 pve14 kernel: [598470.780156] i40e 0000:81:00.3 ens1f3: NIC Link is Down
Jan 2 16:55:05 pve14 kernel: [598475.908633] i40e 0000:81:00.3 ens1f3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 17:40:05 pve14 kernel: [601175.958033] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 17:40:09 pve14 kernel: [601180.307483] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 17:40:20 pve14 kernel: [601191.463912] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 17:40:25 pve14 kernel: [601196.090469] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 17:40:36 pve14 kernel: [601207.789022] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 17:40:41 pve14 kernel: [601212.288786] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 17:41:12 pve14 kernel: [601242.820083] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 17:41:16 pve14 kernel: [601247.177514] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 17:41:58 pve14 kernel: [601288.959934] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 17:42:02 pve14 kernel: [601293.384336] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 17:42:48 pve14 kernel: [601338.911959] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 17:42:52 pve14 kernel: [601343.536913] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 17:56:41 pve14 kernel: [602172.565277] i40e 0000:81:00.3 ens1f3: NIC Link is Down
Jan 2 17:56:47 pve14 kernel: [602178.064176] i40e 0000:81:00.3 ens1f3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 19:21:50 pve14 kernel: [607281.528856] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 19:21:55 pve14 kernel: [607285.947416] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 20:36:26 pve14 kernel: [611757.511305] i40e 0000:81:00.3 ens1f3: NIC Link is Down
Jan 2 20:36:31 pve14 kernel: [611762.822557] i40e 0000:81:00.3 ens1f3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 21:11:05 pve14 kernel: [613836.789537] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 21:11:11 pve14 kernel: [613842.027478] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 21:12:20 pve14 kernel: [613911.468602] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 21:12:24 pve14 kernel: [613915.854505] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 21:12:37 pve14 kernel: [613928.234664] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 21:12:41 pve14 kernel: [613932.779733] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 21:12:46 pve14 kernel: [613937.868638] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 21:12:51 pve14 kernel: [613942.436384] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 21:14:51 pve14 kernel: [614062.756078] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 21:14:56 pve14 kernel: [614067.662046] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 21:27:32 pve14 kernel: [614823.508075] i40e 0000:81:00.3 ens1f3: NIC Link is Down
Jan 2 21:27:37 pve14 kernel: [614828.881194] i40e 0000:81:00.3 ens1f3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 21:38:53 pve14 kernel: [615504.341827] i40e 0000:81:00.3 ens1f3: NIC Link is Down
Jan 2 21:38:58 pve14 kernel: [615509.004784] i40e 0000:81:00.3 ens1f3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jan 2 22:26:57 pve14 kernel: [618388.589768] i40e 0000:81:00.2 ens1f2: NIC Link is Down
Jan 2 22:27:02 pve14 kernel: [618393.767598] i40e 0000:81:00.2 ens1f2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None

Any ideas?
 
Hi,

as far I know the i40e has limited vlan capacity, what can ends in odd behaviors.
please send the network config and the "pvevesions -v" report.
 
Hi,

as far I know the i40e has limited vlan capacity, what can ends in odd behaviors.
please send the network config and the "pvevesions -v" report.

I'm using untagged VLANs on Netgear fully managed switches, all of my ethernet ports are 10G, 2 Corosync Rings, Ceph Public, Ceph Cluster, and 2 for LAN network. Wouldn't all ports have an issue if it was an i40e issue?

I looked a little closer at 'cat /var/log/messages | grep -i i40e' and there are entries for both Ceph eth ports and none for both Corosync eth ports.
All 4 ports are on the same NIC card. Just wondering if is a Ceph specific issue?

cat /etc/network/interfaces

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual
#Prox-LAN

auto ens1f0
iface ens1f0 inet static
address 10.10.1.11
netmask 24
mtu 9000
#CoroSync-R0

auto ens1f1
iface ens1f1 inet static
address 10.10.2.11
netmask 24
mtu 9000
#CoroSync-R1

auto ens1f2
iface ens1f2 inet static
address 10.10.3.11
netmask 24
mtu 9000
#Ceph-Public1

auto ens1f3
iface ens1f3 inet static
address 10.10.4.11
netmask 24
mtu 9000
#Ceph-Cluster

iface eno2 inet manual
#Spare

auto vmbr0
iface vmbr0 inet static
address 192.168.1.110
netmask 16
gateway 192.168.2.3
bridge-ports eno1
bridge-stp off
bridge-fd 0
#LAN-Bridge

pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.5-pve1
ceph-fuse: 14.2.5-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 1.2.8-1+pve4
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 
Last edited:
Can you send these entries from the logs?
I guess it is a Driver bug.
Your config is not complicated so it should work without a problem.

Can you send the output of this command?

Code:
ethtool -i ens1f2
ethtool -k ens1f2
ethtool -S ens1f2
You must install ethtool if it is not installed jet.
 
Can you send these entries from the logs?
I guess it is a Driver bug.
Your config is not complicated so it should work without a problem.

Can you send the output of this command?

Code:
ethtool -i ens1f2
ethtool -k ens1f2
ethtool -S ens1f2
You must install ethtool if it is not installed jet.

see attached file...
 

Attachments