Hello to all,
we have a running Proxmox VE 5.4-13 three node cluster with separate 40Gbit Infiniband dual port card on each server for connecting to a FreeNAS iSCSI server (release is FreeNAS-11.2-U5, latest version).
For the cluster communication we have setup a bond0 with two Gigabit cards.
For the VM communication we have setup bond1 with two 10Gbit SFP+ cards.
This is all generally working as it should with one exception:
The FreeNAS iSCSI server has sometimes a kernel trap 12 and then it is rebooting.
On the FreeNAS side we have also a dual port Mellanox Infiniband ConnectX-3 card which is connected to an Infiniband switch (Grid Director 4036).
The three Proxmox cluster nodes are also connected to this switch over the Infiniband cards.
The Infiniband cards on both ends are configured for connected mode with a MTU of 40950 (with a default MTU=65520 we got lots of connection errors).
We are using a multipath setup with two subnets for IP over Infiniband.
This is working and we get a throughput of ~1-1.1 Gigabyte per second on VMs running on each cluster node in parallel.
Sporadically we get a kernel trap on the FreeNAS server which is then rebooting.
This can happen from half an hour up to 4 days uptime.
In this situation while the FreeNAS server is rebooting, the VMs are not crashing, they are in a delay until the FreeNAS server is online again.
Nevertheless we have to fix it.
I analyzed the FreeNAS crash dumps in /data/crash/ and found out that the last what is happening before the kernel is crashing are events like that:
<118>Fri Sep 6 05:20:07 CEST 2019
<6>arp: 10.20.24.111 moved from 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c1 to 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c2 on ib0
<6>arp: 10.20.24.110 moved from 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e3 to 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e4 on ib0
<6>arp: 10.20.25.111 moved from 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c2 to 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c1 on ib1
<6>arp: 10.20.25.110 moved from 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e4 to 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e3 on ib1
<6>arp: 10.20.24.111 moved from 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c2 to 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c1 on ib0
<4>ib0: packet len 12380 (> 2044) too long to send, dropping
It is every time the same behavior. The both 20-octet IPoIB link-layer addresses on all three Proxmox clients are changing from time to time.
After that it looks like that the FreeBSD server sometimes is using the Datagram mode for the new connections and then it tries to send a large packet over this connection which could come from a previous client request/connection in Connected mode.
The root cause seems to be the changing of the IP addresses/link layer addresses on the Proxmox client side and secondary the Datagram mode behavior on the FreeBSD side.
Maybe somebody has an idea what is happening here with the link layer addresses and how to avoid that?
Here are more details of the setup:
On the FreeNAS side I`m using an own subnet for each IB port and I have two portals in the iSCSI setup for each IP (10.20.24.100/24 & 10.20.25.100/24).
The kernel of FreeNAS:
root@freenas1[/data/crash]# uname -a
FreeBSD freenas1 11.2-STABLE FreeBSD 11.2-STABLE #0 r325575+6aad246318c(HEAD): Mon Jun 24 17:25:47 UTC 2019 root@nemesis:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64 amd64
The modules loaded on FreeNAS side:
root@freenas1[/data/crash]# cat /boot/loader.conf.local
mlx4ib_load="YES" # Be sure that Kernel modul Melloanox 4 Infiniband will be loaded
ipoib_load="YES" # Be sure that Kernel modul IP over Infiniband will be loaded
kernel="kernel"
module_path="/boot/kernel;/boot/modules;/usr/local/modules"
kern.cam.ctl.ha_id=0
root@freenas1[/data/crash]# kldstat
Id Refs Address Size Name
1 72 0xffffffff80200000 25608a8 kernel
2 1 0xffffffff82762000 100eb0 ispfw.ko
3 1 0xffffffff82863000 f9f8 ipmi.ko
4 2 0xffffffff82873000 2d28 smbus.ko
5 1 0xffffffff82876000 8a10 freenas_sysctl.ko
6 1 0xffffffff8287f000 3aff0 mlx4ib.ko
7 1 0xffffffff828ba000 1a388 ipoib.ko
8 1 0xffffffff82d11000 32e048 vmm.ko
9 1 0xffffffff83040000 a74 nmdm.ko
10 1 0xffffffff83041000 e610 geom_mirror.ko
11 1 0xffffffff83050000 3a3c geom_multipath.ko
12 1 0xffffffff83054000 2ec dtraceall.ko
13 9 0xffffffff83055000 3acf8 dtrace.ko
14 1 0xffffffff83090000 5b8 dtmalloc.ko
15 1 0xffffffff83091000 1898 dtnfscl.ko
16 1 0xffffffff83093000 1d31 fbt.ko
17 1 0xffffffff83095000 53390 fasttrap.ko
18 1 0xffffffff830e9000 bfc sdt.ko
19 1 0xffffffff830ea000 6d80 systrace.ko
20 1 0xffffffff830f1000 6d48 systrace_freebsd32.ko
21 1 0xffffffff830f8000 f9c profile.ko
22 1 0xffffffff830f9000 13ec0 hwpmc.ko
23 1 0xffffffff8310d000 7340 t3_tom.ko
24 2 0xffffffff83115000 ab8 toecore.ko
25 1 0xffffffff83116000 ddac t4_tom.ko
Kernel running on Proxmox:
root@pvecn1:~# uname -a
Linux pvecn1 4.15.18-20-pve #1 SMP PVE 4.15.18-46 (Thu, 8 Aug 2019 10:42:06 +0200) x86_64 GNU/Linux
The modules loaded on Proxmox side:
root@pvecn1:~# cat /etc/modules-load.d/mellanox.conf
mlx4_core
mlx4_ib
mlx4_en
ib_cm
ib_core
ib_ipoib
ib_iser
ib_umad
The Infiniband network setup for example on the first Proxmox client:
# Mellanox Infiniband
auto ib0
iface ib0 inet static
address 10.20.24.110
netmask 255.255.255.0
pre-up echo connected > /sys/class/net/$IFACE/mode
#post-up /sbin/ifconfig $IFACE mtu 65520
post-up /sbin/ifconfig $IFACE mtu 40950
# Mellanox Infiniband
auto ib1
iface ib1 inet static
address 10.20.25.110
netmask 255.255.255.0
pre-up echo connected > /sys/class/net/$IFACE/mode
#post-up /sbin/ifconfig $IFACE mtu 65520
post-up /sbin/ifconfig $IFACE mtu 40950
On the Proxmox side I`m running a multipath setup.
This is the content of /etc/multipath.conf:
defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
uid_attribute ID_SERIAL
rr_min_io_rq 1
rr_weight uniform
failback immediate
no_path_retry queue
user_friendly_names yes
}
...
ifconfig for the both Infiniband ports on the FreeNAS server looks like that:
ib0: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 40950
options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.0.2.c9.3.0.3a.ed.41
inet 10.20.24.210 netmask 0xffffff00 broadcast 10.20.24.255
nd6 options=9<PERFORMNUD,IFDISABLED>
ib1: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 40950
options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
lladdr 80.0.2.9.fe.80.0.0.0.0.0.0.0.2.c9.3.0.3a.ed.42
inet 10.20.25.210 netmask 0xffffff00 broadcast 10.20.25.255
nd6 options=9<PERFORMNUD,IFDISABLED>
ifconfig on the first Proxmox client:
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 40950
inet 10.20.24.110 netmask 255.255.255.0 broadcast 10.20.24.255
inet6 fe80::202:c903:9:20e3 prefixlen 64 scopeid 0x20<link>
unspec 80-00-02-08-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 5596912 bytes 10293861835 (9.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3744669 bytes 48471009082 (45.1 GiB)
TX errors 0 dropped 125 overruns 0 carrier 0 collisions 0
ib1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 40950
inet 10.20.25.110 netmask 255.255.255.0 broadcast 10.20.25.255
inet6 fe80::202:c903:9:20e4 prefixlen 64 scopeid 0x20<link>
unspec 80-00-02-09-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 6863837 bytes 8858149718 (8.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6197948 bytes 96516756048 (89.8 GiB)
TX errors 0 dropped 257 overruns 0 carrier 0 collisions 0
Any hints are welcome.
Regards
Ralf
we have a running Proxmox VE 5.4-13 three node cluster with separate 40Gbit Infiniband dual port card on each server for connecting to a FreeNAS iSCSI server (release is FreeNAS-11.2-U5, latest version).
For the cluster communication we have setup a bond0 with two Gigabit cards.
For the VM communication we have setup bond1 with two 10Gbit SFP+ cards.
This is all generally working as it should with one exception:
The FreeNAS iSCSI server has sometimes a kernel trap 12 and then it is rebooting.
On the FreeNAS side we have also a dual port Mellanox Infiniband ConnectX-3 card which is connected to an Infiniband switch (Grid Director 4036).
The three Proxmox cluster nodes are also connected to this switch over the Infiniband cards.
The Infiniband cards on both ends are configured for connected mode with a MTU of 40950 (with a default MTU=65520 we got lots of connection errors).
We are using a multipath setup with two subnets for IP over Infiniband.
This is working and we get a throughput of ~1-1.1 Gigabyte per second on VMs running on each cluster node in parallel.
Sporadically we get a kernel trap on the FreeNAS server which is then rebooting.
This can happen from half an hour up to 4 days uptime.
In this situation while the FreeNAS server is rebooting, the VMs are not crashing, they are in a delay until the FreeNAS server is online again.
Nevertheless we have to fix it.
I analyzed the FreeNAS crash dumps in /data/crash/ and found out that the last what is happening before the kernel is crashing are events like that:
<118>Fri Sep 6 05:20:07 CEST 2019
<6>arp: 10.20.24.111 moved from 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c1 to 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c2 on ib0
<6>arp: 10.20.24.110 moved from 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e3 to 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e4 on ib0
<6>arp: 10.20.25.111 moved from 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c2 to 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c1 on ib1
<6>arp: 10.20.25.110 moved from 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e4 to 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:20:e3 on ib1
<6>arp: 10.20.24.111 moved from 80:00:02:09:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c2 to 80:00:02:08:fe:80:00:00:00:00:00:00:00:02:c9:03:00:09:9f:c1 on ib0
<4>ib0: packet len 12380 (> 2044) too long to send, dropping
It is every time the same behavior. The both 20-octet IPoIB link-layer addresses on all three Proxmox clients are changing from time to time.
After that it looks like that the FreeBSD server sometimes is using the Datagram mode for the new connections and then it tries to send a large packet over this connection which could come from a previous client request/connection in Connected mode.
The root cause seems to be the changing of the IP addresses/link layer addresses on the Proxmox client side and secondary the Datagram mode behavior on the FreeBSD side.
Maybe somebody has an idea what is happening here with the link layer addresses and how to avoid that?
Here are more details of the setup:
On the FreeNAS side I`m using an own subnet for each IB port and I have two portals in the iSCSI setup for each IP (10.20.24.100/24 & 10.20.25.100/24).
The kernel of FreeNAS:
root@freenas1[/data/crash]# uname -a
FreeBSD freenas1 11.2-STABLE FreeBSD 11.2-STABLE #0 r325575+6aad246318c(HEAD): Mon Jun 24 17:25:47 UTC 2019 root@nemesis:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64 amd64
The modules loaded on FreeNAS side:
root@freenas1[/data/crash]# cat /boot/loader.conf.local
mlx4ib_load="YES" # Be sure that Kernel modul Melloanox 4 Infiniband will be loaded
ipoib_load="YES" # Be sure that Kernel modul IP over Infiniband will be loaded
kernel="kernel"
module_path="/boot/kernel;/boot/modules;/usr/local/modules"
kern.cam.ctl.ha_id=0
root@freenas1[/data/crash]# kldstat
Id Refs Address Size Name
1 72 0xffffffff80200000 25608a8 kernel
2 1 0xffffffff82762000 100eb0 ispfw.ko
3 1 0xffffffff82863000 f9f8 ipmi.ko
4 2 0xffffffff82873000 2d28 smbus.ko
5 1 0xffffffff82876000 8a10 freenas_sysctl.ko
6 1 0xffffffff8287f000 3aff0 mlx4ib.ko
7 1 0xffffffff828ba000 1a388 ipoib.ko
8 1 0xffffffff82d11000 32e048 vmm.ko
9 1 0xffffffff83040000 a74 nmdm.ko
10 1 0xffffffff83041000 e610 geom_mirror.ko
11 1 0xffffffff83050000 3a3c geom_multipath.ko
12 1 0xffffffff83054000 2ec dtraceall.ko
13 9 0xffffffff83055000 3acf8 dtrace.ko
14 1 0xffffffff83090000 5b8 dtmalloc.ko
15 1 0xffffffff83091000 1898 dtnfscl.ko
16 1 0xffffffff83093000 1d31 fbt.ko
17 1 0xffffffff83095000 53390 fasttrap.ko
18 1 0xffffffff830e9000 bfc sdt.ko
19 1 0xffffffff830ea000 6d80 systrace.ko
20 1 0xffffffff830f1000 6d48 systrace_freebsd32.ko
21 1 0xffffffff830f8000 f9c profile.ko
22 1 0xffffffff830f9000 13ec0 hwpmc.ko
23 1 0xffffffff8310d000 7340 t3_tom.ko
24 2 0xffffffff83115000 ab8 toecore.ko
25 1 0xffffffff83116000 ddac t4_tom.ko
Kernel running on Proxmox:
root@pvecn1:~# uname -a
Linux pvecn1 4.15.18-20-pve #1 SMP PVE 4.15.18-46 (Thu, 8 Aug 2019 10:42:06 +0200) x86_64 GNU/Linux
The modules loaded on Proxmox side:
root@pvecn1:~# cat /etc/modules-load.d/mellanox.conf
mlx4_core
mlx4_ib
mlx4_en
ib_cm
ib_core
ib_ipoib
ib_iser
ib_umad
The Infiniband network setup for example on the first Proxmox client:
# Mellanox Infiniband
auto ib0
iface ib0 inet static
address 10.20.24.110
netmask 255.255.255.0
pre-up echo connected > /sys/class/net/$IFACE/mode
#post-up /sbin/ifconfig $IFACE mtu 65520
post-up /sbin/ifconfig $IFACE mtu 40950
# Mellanox Infiniband
auto ib1
iface ib1 inet static
address 10.20.25.110
netmask 255.255.255.0
pre-up echo connected > /sys/class/net/$IFACE/mode
#post-up /sbin/ifconfig $IFACE mtu 65520
post-up /sbin/ifconfig $IFACE mtu 40950
On the Proxmox side I`m running a multipath setup.
This is the content of /etc/multipath.conf:
defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
uid_attribute ID_SERIAL
rr_min_io_rq 1
rr_weight uniform
failback immediate
no_path_retry queue
user_friendly_names yes
}
...
ifconfig for the both Infiniband ports on the FreeNAS server looks like that:
ib0: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 40950
options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.0.2.c9.3.0.3a.ed.41
inet 10.20.24.210 netmask 0xffffff00 broadcast 10.20.24.255
nd6 options=9<PERFORMNUD,IFDISABLED>
ib1: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 40950
options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
lladdr 80.0.2.9.fe.80.0.0.0.0.0.0.0.2.c9.3.0.3a.ed.42
inet 10.20.25.210 netmask 0xffffff00 broadcast 10.20.25.255
nd6 options=9<PERFORMNUD,IFDISABLED>
ifconfig on the first Proxmox client:
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 40950
inet 10.20.24.110 netmask 255.255.255.0 broadcast 10.20.24.255
inet6 fe80::202:c903:9:20e3 prefixlen 64 scopeid 0x20<link>
unspec 80-00-02-08-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 5596912 bytes 10293861835 (9.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3744669 bytes 48471009082 (45.1 GiB)
TX errors 0 dropped 125 overruns 0 carrier 0 collisions 0
ib1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 40950
inet 10.20.25.110 netmask 255.255.255.0 broadcast 10.20.25.255
inet6 fe80::202:c903:9:20e4 prefixlen 64 scopeid 0x20<link>
unspec 80-00-02-09-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC)
RX packets 6863837 bytes 8858149718 (8.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6197948 bytes 96516756048 (89.8 GiB)
TX errors 0 dropped 257 overruns 0 carrier 0 collisions 0
Any hints are welcome.
Regards
Ralf