igb driver on latest kernel 4.15.17-3-pve - net connections over jumbo frames anomalies

resoli

Renowned Member
Mar 9, 2010
147
4
83
I have recently routinely updated my 3 node pve 5.2 cluster setup, installing new pve-kernel-4.15.17-3-pve .

I started to have problems in syncronizing drbd resources over a link with jumbo frames enabled:


Jun 14 08:59:26 pve1 kernel: [40906.042440] drbd vm-102-disk-1/0 drbd103 pve3: Began resync as SyncSource (will sync 3476 KB [869 bits set]).
Jun 14 08:59:39 pve1 kernel: [40918.313936] drbd vm-102-disk-1 pve3: [drbd_s_vm-102-d/3075] sending time expired, ko = 6
Jun 14 08:59:45 pve1 kernel: [40924.458069] drbd vm-102-disk-1 pve3: [drbd_s_vm-102-d/3075] sending time expired, ko = 5
Jun 14 08:59:51 pve1 kernel: [40930.602191] drbd vm-102-disk-1 pve3: [drbd_s_vm-102-d/3075] sending time expired, ko = 4
Jun 14 08:59:57 pve1 kernel: [40936.746355] drbd vm-102-disk-1 pve3: [drbd_s_vm-102-d/3075] sending time expired, ko = 3​


My configuration uses drbd9 in a dedicated network mesh configuration described here:

https://lists.gt.net/drbd/users/28251#28251

In brief, I put two interfaces in a "drbdbr" bridge on each host, blocking forwarding with ebtables rules.

Each interface has jumbo frames (mtu=9000) enabled.

I suspect the problem is in the update (out of tree) igb driver,

Intel(R) Gigabit Ethernet Linux Driver - version 5.3.5.18​

because in the forum I saw there was problems with jumbo frames in the past.

I reverted back to the previous "pve-kernel-4.15.17-2-pve" kernel with in-tree igb driver:

igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
to restore correct functionality. Unfortunately removing latest kernel has as side effect the removal of proxmox-ve and pve-kernel-4.15 packages.

Any hint?

Thanks,
rob
 
Each interface has jumbo frames (mtu=9000) enabled.
Maybe it helps to lower the MTU, some drivers do not account for the whole frame.

I reverted back to the previous "pve-kernel-4.15.17-2-pve" kernel with in-tree igb driver:
You can set in grub from which kernel you want to boot by default (grub-set-default).
 
I already tried to lower mtu to 1500: solves connection problems, but performance penalty is unbearable.

I know that i can boot with a previous kernel, thanks for the hint: I was'nt aware of "grub-set-default" command; very handy.

My opinion is that in many situations a driver that does not works well with jumbo frames makes the kernel in object useless ...

cheers,
rob
 
I already tried to lower mtu to 1500: solves connection problems, but performance penalty is unbearable.
Lowering moderately, eg. 8800, so that the overhead of the frame fits into the MTU.
 
Sorry, but I do not want to dedicate further time (involving reboot of all nodes) on what seems to me clearly a driver issue. I will follow the 4.15 thread. Please consider this one closed.

Thanks to all,
rob