ixgbe CX4 "adapter removed" with kernel >4.15.18-18

ftrojahn

Active Member
Dec 21, 2018
24
4
43
Hi all,

about 6h after upgrading to 4.15.18-19-pve we noticed a failing CX4 network adapter showing in dmesg only:

Aug 13 08:33:37 vm4 kernel: [ 3.297231] ixgbe 0000:81:00.0: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 XDP Queue count = 0
Aug 13 08:33:37 vm4 kernel: [ 3.297410] ixgbe 0000:81:00.0: PCI Express bandwidth of 16GT/s available
Aug 13 08:33:37 vm4 kernel: [ 3.297412] ixgbe 0000:81:00.0: (Speed:2.5GT/s, Width: x8, Encoding Loss:20%)
Aug 13 08:33:37 vm4 kernel: [ 3.297483] ixgbe 0000:81:00.0: MAC: 1, PHY: 0, PBA No: E37623-004
Aug 13 08:33:37 vm4 kernel: [ 3.297484] ixgbe 0000:81:00.0: 00:1b:21:8d:d8:d3
Aug 13 08:33:37 vm4 kernel: [ 3.309510] ixgbe 0000:81:00.0: Intel(R) 10 Gigabit Network Connection
Aug 13 08:33:37 vm4 kernel: [ 3.409215] ixgbe 0000:81:00.1: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 XDP Queue count = 0
Aug 13 08:33:37 vm4 kernel: [ 3.409394] ixgbe 0000:81:00.1: PCI Express bandwidth of 16GT/s available
Aug 13 08:33:37 vm4 kernel: [ 3.409396] ixgbe 0000:81:00.1: (Speed:2.5GT/s, Width: x8, Encoding Loss:20%)
Aug 13 08:33:37 vm4 kernel: [ 3.409467] ixgbe 0000:81:00.1: MAC: 1, PHY: 0, PBA No: E37623-004
Aug 13 08:33:37 vm4 kernel: [ 3.409468] ixgbe 0000:81:00.1: 00:1b:21:8d:d8:d2
Aug 13 08:33:37 vm4 kernel: [ 3.421464] ixgbe 0000:81:00.1: Intel(R) 10 Gigabit Network Connection
Aug 13 08:33:37 vm4 kernel: [ 3.544187] ixgbe 0000:81:00.1 ens6f1: renamed from eth4
Aug 13 08:33:37 vm4 kernel: [ 3.572225] ixgbe 0000:81:00.0 ens6f0: renamed from eth3
Aug 13 08:33:37 vm4 kernel: [ 8.470703] ixgbe 0000:81:00.1 ens6f1: changing MTU from 1500 to 9000
Aug 13 08:33:38 vm4 kernel: [ 8.898277] ixgbe 0000:81:00.1 ens6f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

Aug 13 14:31:02 vm4 kernel: [21452.957177] ixgbe 0000:81:00.1: Adapter removed

All other ixgbe network adapters are working. Only after shutting the server off (reboot didn't suffice) and
using kernel 4.15.18-18-pve again same adapter seems to work now for more than a day.

unload/load driver using modprobe didn't work, either (while dropping all other, ixgbe base network connections).

Mainboard is Supermicro X10DRI-T.

Card is 2xCX4, only one connected to an HP6410 switch:
Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)

driver: ixgbe
version: 5.1.0-k
firmware-version: 0xb5050000
expansion-rom-version:
bus-info: 0000:81:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

proxmox-ve: 5.4-2 (running kernel: 4.15.18-18-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-8
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-19-pve: 4.15.18-45
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
ceph: 12.2.12-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-54
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-5
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

AFAIK are no changes in 4.15.18-20 related to ixgbe, so not tested, yet, as node is in production.

So, any idea what I could try to find the reason for the drop, or just stick with 4.15.18-18 for now?

Thanx in advance
Falko
 
With which kernel?

Strange thing is: our system runs now for 132 days with this kernel: 4.15.18-18-pve #1 SMP PVE 4.15.18-44, adapter did not fail since.
 
Is somebody figure it out why it happens?
I have 40 servers, randomly disconnected from the network, all of them have this error message in the log:
ixgbe 0000:18:00.0: Adapter removed

will appreciate a tip here..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!