Infiniband driver update (in kernel) cause cluster multicast communication problems

Whatever

Renowned Member
Nov 19, 2012
393
63
93
In new kernel 2.6.32-37 an Infiniband driver was updated from:

Code:
Mar 10 17:46:37 pve02A kernel: mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
Mar 10 17:46:37 pve02A kernel: mlx4_core: Initializing 0000:05:00.0
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: setting latency timer to 64
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 57 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 58 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 59 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 60 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 61 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 62 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 63 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 64 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 65 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 66 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 67 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 68 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 69 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 70 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 71 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 72 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 73 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 74 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 75 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 76 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 77 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 78 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 10 17:46:37 pve02A kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)                                                                                                                                  ril

to

Code:
Mar 11 13:44:07 pve02A kernel: mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
Mar 11 13:44:07 pve02A kernel: mlx4_core: Initializing 0000:05:00.0
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: setting latency timer to 64
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 57 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 58 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 59 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 60 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 61 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 62 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 63 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 64 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 65 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 66 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 67 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 68 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 69 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 70 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 71 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 72 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 73 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 74 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 75 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 76 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 77 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 78 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 79 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 80 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 81 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 82 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 83 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 84 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 85 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 86 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 87 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 88 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 89 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 90 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 91 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 92 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 93 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 94 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 11 13:44:07 pve02A kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)

With an updated driver the host is no longer able to connect the cluster. And the counter of dropped packets is increasing continuously

Code:
ib0       Link encap:UNSPEC  HWaddr 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:172.16.253.16  Bcast:172.16.253.255  Mask:255.255.255.0
          inet6 addr: fe80::216:35ff:ffbf:ab09/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:1566 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1008 errors:0 dropped:432 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:563840 (550.6 KiB)  TX bytes:662762 (647.2 KiB)
I've tried to switch from connected mode to datagram as well as changing MTU but without success.

Any ideas would be very appreciated!
 
Re: Infiniband driver update (in kernel) cause cluster multicast communication probl

Thanks for your reply! Disappointed..

So, there is not way to use ZFS "built-in" functionality without new kernel, I'm correct?