In new kernel 2.6.32-37 an Infiniband driver was updated from:
to
With an updated driver the host is no longer able to connect the cluster. And the counter of dropped packets is increasing continuously
I've tried to switch from connected mode to datagram as well as changing MTU but without success.
Any ideas would be very appreciated!
Code:
Mar 10 17:46:37 pve02A kernel: mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
Mar 10 17:46:37 pve02A kernel: mlx4_core: Initializing 0000:05:00.0
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: setting latency timer to 64
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 57 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 58 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 59 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 60 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 61 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 62 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 63 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 64 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 65 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 66 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 67 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 68 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 69 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 70 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 71 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 72 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 73 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 74 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 75 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 76 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 77 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: irq 78 for MSI/MSI-X
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 10 17:46:37 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 10 17:46:37 pve02A kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) ril
to
Code:
Mar 11 13:44:07 pve02A kernel: mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
Mar 11 13:44:07 pve02A kernel: mlx4_core: Initializing 0000:05:00.0
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: setting latency timer to 64
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 57 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 58 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 59 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 60 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 61 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 62 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 63 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 64 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 65 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 66 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 67 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 68 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 69 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 70 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 71 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 72 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 73 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 74 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 75 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 76 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 77 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 78 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 79 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 80 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 81 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 82 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 83 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 84 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 85 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 86 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 87 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 88 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 89 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 90 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 91 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 92 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 93 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: irq 94 for MSI/MSI-X
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 11 13:44:07 pve02A kernel: mlx4_core 0000:05:00.0: command 0xc failed: fw status = 0x40
Mar 11 13:44:07 pve02A kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
With an updated driver the host is no longer able to connect the cluster. And the counter of dropped packets is increasing continuously
Code:
ib0 Link encap:UNSPEC HWaddr 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:172.16.253.16 Bcast:172.16.253.255 Mask:255.255.255.0
inet6 addr: fe80::216:35ff:ffbf:ab09/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:1566 errors:0 dropped:0 overruns:0 frame:0
TX packets:1008 errors:0 dropped:432 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:563840 (550.6 KiB) TX bytes:662762 (647.2 KiB)
Any ideas would be very appreciated!