Hi, sorry to ask, I'm a bit clueless. Having trouble lately with a network interface that goes hang or suddenly "Link detected: no" when huge/spike network traffic happens and only fixed if the server restarted. Restart networking service or ifup/down interface won't work.
I double-check, switch port is up and the cable is plugged correctly.
The log that show up before that was these and I use Ceph for backup storage. At first, I thought it happened because the backup process takes huge bandwidth, but now the problem occurs even when the backup isn't running.
Oct 22 09:24:55 hostname kernel: libceph: mon0 (1)ip-address:6789 socket closed (con state CONNECTING)
Oct 22 09:24:55 hostname kernel: libceph: mon0 (1)ip-address:6789 socket closed (con state CONNECTING)
Oct 22 09:24:51 hostname kernel: libceph: mon0 (1)ip-address:6789 socket closed (con state CONNECTING)
Oct 22 09:24:22 hostname kernel: libceph: mon0 (1)ip-address:6789 session lost, hunting for new mon
Oct 22 09:24:20 hostname kernel: libceph: mon2 (1)ip-address:6789 session lost, hunting for new mon
Oct 22 09:23:53 hostname kernel: i40e 0000:3d:00.1 ens5f2: tx_timeout recovery level 1, hung_queue 5
Oct 22 09:23:53 hostname kernel: i40e 0000:3d:00.1 ens5f2: tx_timeout: VSI_seid: 397, Q 5, NTC: 0x1bd, HWB: 0x1f3, NTU: 0x1f3, TAIL: 0x1f3, INT: 0x0
Proxmox version:
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
...
ceph: 12.2.13-pve1
ceph-fuse: 12.2.13-pve1
...
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-22
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
Here are some details of my NIC.
$ ethtool ens5f2
Supported ports: [ TP ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 1000baseT/Full
10000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
$ ethtool -i ens5f2
driver: i40e
version: 2.8.20-k
firmware-version: 3.2d 0x80000b4b 1.1767.0
expansion-rom-version:
bus-info: 0000:3d:00.2
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
$ lshw -class network
*-network:2
description: Ethernet interface
product: Ethernet Connection X722 for 10GBASE-T
vendor: Intel Corporation
...
capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=i40e driverversion=2.8.20-k duplex=full firmware=3.2d 0x80000b4b 1.1767.0 latency=0 link=no multicast=yes port=twisted pair speed=10Gbit/s
I double-check, switch port is up and the cable is plugged correctly.
The log that show up before that was these and I use Ceph for backup storage. At first, I thought it happened because the backup process takes huge bandwidth, but now the problem occurs even when the backup isn't running.
Oct 22 09:24:55 hostname kernel: libceph: mon0 (1)ip-address:6789 socket closed (con state CONNECTING)
Oct 22 09:24:55 hostname kernel: libceph: mon0 (1)ip-address:6789 socket closed (con state CONNECTING)
Oct 22 09:24:51 hostname kernel: libceph: mon0 (1)ip-address:6789 socket closed (con state CONNECTING)
Oct 22 09:24:22 hostname kernel: libceph: mon0 (1)ip-address:6789 session lost, hunting for new mon
Oct 22 09:24:20 hostname kernel: libceph: mon2 (1)ip-address:6789 session lost, hunting for new mon
Oct 22 09:23:53 hostname kernel: i40e 0000:3d:00.1 ens5f2: tx_timeout recovery level 1, hung_queue 5
Oct 22 09:23:53 hostname kernel: i40e 0000:3d:00.1 ens5f2: tx_timeout: VSI_seid: 397, Q 5, NTC: 0x1bd, HWB: 0x1f3, NTU: 0x1f3, TAIL: 0x1f3, INT: 0x0
Proxmox version:
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
...
ceph: 12.2.13-pve1
ceph-fuse: 12.2.13-pve1
...
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-22
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
Here are some details of my NIC.
$ ethtool ens5f2
Supported ports: [ TP ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 1000baseT/Full
10000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
$ ethtool -i ens5f2
driver: i40e
version: 2.8.20-k
firmware-version: 3.2d 0x80000b4b 1.1767.0
expansion-rom-version:
bus-info: 0000:3d:00.2
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
$ lshw -class network
*-network:2
description: Ethernet interface
product: Ethernet Connection X722 for 10GBASE-T
vendor: Intel Corporation
...
capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=i40e driverversion=2.8.20-k duplex=full firmware=3.2d 0x80000b4b 1.1767.0 latency=0 link=no multicast=yes port=twisted pair speed=10Gbit/s