Scenario:
Brand new PVE installation for a new Compute node - ISO PVE 7.3, all updates installed, now pve-manager/7.3-3/c3928077 (running kernel: 5.15.74-1-pve)
Networking is supposed to be provided via a 2-member bond, 2x 10G interfaces on a single Mellanox MT27500 (ConnectX-3).
The bond is supposed (for now) to carry 3 tagged VLANs, plus the vmbr0 for guest traffic.
vlan75@bond0: 192.168.75.0/24 Mgmt traffic, SSH access, corosync cluster traffic
vlan76@bond0: 192.168.76.0/24 Storage traffic, RBD volumes on a 3 node pveceph cluster (the nodes are joined to the same PVE cluster)
vlan302@bond0: 172.19.0.0/16 NFS mounts on NAS, for ISO images
I've copied the configuration basically 1:1, save for the specific IP addresses, from another node in the same PVE cluster:
ifreload -a, and everything went fine, I joined the cluster, moved guests there and back..... until I decided to reboot the node and I could no longer reach the mgmt IP address 192.168.75.11 because ifupdown2 didn't want to install the default route ("gateway" statement).
Thankfully, Google landed me right at the solution, posted on the ifupdown2 github by Proxmox @aderumier .
Funny, the ifupdown2 guys are proud that their code resolves dependencies by itself (you can even see the depends displayed with a CLI parameter) - yet apparently this does not translate into the code and you need *just the right order of interfaces* to allow this house of cards to work. Reassuring... apparently it's also unimportant enough that nobody even posted there after Oct 30, 2019.
The config I used above worked just fine (and continue to works just fine) on our older Proxmox nodes, but those obviously do not use ifupdown2 as they we're upgraded through several PVE releases over the years, so this explains it.
Anyway, now we end up with this configuration:
.. and my default route survives a reboot just fine now, so thank you Alexandre for finding the workaround.
I can reach all destinations (that I care about at least, did not check every possible IP) in both vlan75 and vlan76, but in vlan302, ARP resolution works, I can send ICMP requests out and they are received just fine on the other side, and I can find the echo request in my tcpdump -i vlan302 - but my ping command tells me I've got 100% packet loss! The packets do not seem to be handled correctly by whatever component of the network stack.
It's not an ICMP quirk either, I could care less about ICMP requests and their replies, but pvesm can no longer mount the NFS store (server 172.19.3.9).
Now here's the twist: if I keep the vlan302 part commented out in the config, reboot the server, and following the reboot I add the section back in and ifreload -a, everything works just fine! Curiously similar to the first issue with the gateway, where it would fail at boot, but succeed if the same command was executed on a running system...
How can I go about fixing this? Is it ifupdown2 yet again, is it a Mellanox mlx4_* issue, ...? The very same Mellanox card has been in use for years, up until now, in the same PVE cluster, we're just replacing the aging hardware itself, but the Mlx are only 3 years old and therefore still good for the new servers as well. No issue at all with the older servers and their NFS mounting behavior....
Does it make any difference whether I configure the VLAN interface as vlan302[@bond0] or bond0.302?
EDIT: the behavior has been the same on 2 other nodes I've begun replacing, one of them, after yet another reboot, now DOES reach the NFS storage. So it looks like it doesn't fail reliably, in 100% of cases, but only like at least 80%?
Brand new PVE installation for a new Compute node - ISO PVE 7.3, all updates installed, now pve-manager/7.3-3/c3928077 (running kernel: 5.15.74-1-pve)
Networking is supposed to be provided via a 2-member bond, 2x 10G interfaces on a single Mellanox MT27500 (ConnectX-3).
The bond is supposed (for now) to carry 3 tagged VLANs, plus the vmbr0 for guest traffic.
vlan75@bond0: 192.168.75.0/24 Mgmt traffic, SSH access, corosync cluster traffic
vlan76@bond0: 192.168.76.0/24 Storage traffic, RBD volumes on a 3 node pveceph cluster (the nodes are joined to the same PVE cluster)
vlan302@bond0: 172.19.0.0/16 NFS mounts on NAS, for ISO images
I've copied the configuration basically 1:1, save for the specific IP addresses, from another node in the same PVE cluster:
Code:
auto lo
iface lo inet loopback
iface ens1 inet manual
iface ens1d1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens1 ens1d1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
bond-min-links 1
bond-lacp-rate 1
auto vlan302
iface vlan302 inet static
address 172.19.76.11/16
vlan-raw-device bond0
auto vlan75
iface vlan75 inet static
address 192.168.75.11/24
gateway 192.168.75.254
vlan-raw-device bond0
auto vlan76
iface vlan76 inet static
address 192.168.76.11/24
vlan-raw-device bond0
auto vmbr0
iface vmbr0 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
ifreload -a, and everything went fine, I joined the cluster, moved guests there and back..... until I decided to reboot the node and I could no longer reach the mgmt IP address 192.168.75.11 because ifupdown2 didn't want to install the default route ("gateway" statement).
Thankfully, Google landed me right at the solution, posted on the ifupdown2 github by Proxmox @aderumier .
Funny, the ifupdown2 guys are proud that their code resolves dependencies by itself (you can even see the depends displayed with a CLI parameter) - yet apparently this does not translate into the code and you need *just the right order of interfaces* to allow this house of cards to work. Reassuring... apparently it's also unimportant enough that nobody even posted there after Oct 30, 2019.
The config I used above worked just fine (and continue to works just fine) on our older Proxmox nodes, but those obviously do not use ifupdown2 as they we're upgraded through several PVE releases over the years, so this explains it.
Anyway, now we end up with this configuration:
Code:
auto lo
iface lo inet loopback
iface ens1 inet manual
iface ens1d1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens1 ens1d1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
bond-min-links 1
bond-lacp-rate 1
auto vmbr0
iface vmbr0 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
auto vlan302
iface vlan302 inet static
address 172.19.76.11/16
vlan-raw-device bond0
auto vlan75
iface vlan75 inet static
address 192.168.75.11/24
gateway 192.168.75.254
vlan-raw-device bond0
auto vlan76
iface vlan76 inet static
address 192.168.76.11/24
vlan-raw-device bond0
.. and my default route survives a reboot just fine now, so thank you Alexandre for finding the workaround.
I can reach all destinations (that I care about at least, did not check every possible IP) in both vlan75 and vlan76, but in vlan302, ARP resolution works, I can send ICMP requests out and they are received just fine on the other side, and I can find the echo request in my tcpdump -i vlan302 - but my ping command tells me I've got 100% packet loss! The packets do not seem to be handled correctly by whatever component of the network stack.
Code:
root@prox11:~# ip neigh del 172.19.3.9 dev vlan302
root@prox11:~# ping 172.19.3.9
PING 172.19.3.9 (172.19.3.9) 56(84) bytes of data.
^C
--- 172.19.3.9 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms
At the same time in my tcpdump session:
root@prox11:~# tcpdump -nvvv -s 0 -i vlan302 'host 172.19.3.9'
tcpdump: listening on vlan302, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:33:52.286022 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.19.3.9 tell 172.19.76.11, length 28
20:33:52.290390 ARP, Ethernet (len 6), IPv4 (len 4), Reply 172.19.3.9 is-at 00:11:32:33:38:89, length 46
20:33:52.290406 IP (tos 0x0, ttl 64, id 55622, offset 0, flags [DF], proto ICMP (1), length 84)
172.19.76.11 > 172.19.3.9: ICMP echo request, id 34101, seq 1, length 64
20:33:52.290507 IP (tos 0x0, ttl 64, id 64551, offset 0, flags [none], proto ICMP (1), length 84)
172.19.3.9 > 172.19.76.11: ICMP echo reply, id 34101, seq 1, length 64
20:33:53.290596 IP (tos 0x0, ttl 64, id 55820, offset 0, flags [DF], proto ICMP (1), length 84)
172.19.76.11 > 172.19.3.9: ICMP echo request, id 34101, seq 2, length 64
20:33:53.290711 IP (tos 0x0, ttl 64, id 64790, offset 0, flags [none], proto ICMP (1), length 84)
172.19.3.9 > 172.19.76.11: ICMP echo reply, id 34101, seq 2, length 64
It's not an ICMP quirk either, I could care less about ICMP requests and their replies, but pvesm can no longer mount the NFS store (server 172.19.3.9).
Now here's the twist: if I keep the vlan302 part commented out in the config, reboot the server, and following the reboot I add the section back in and ifreload -a, everything works just fine! Curiously similar to the first issue with the gateway, where it would fail at boot, but succeed if the same command was executed on a running system...
How can I go about fixing this? Is it ifupdown2 yet again, is it a Mellanox mlx4_* issue, ...? The very same Mellanox card has been in use for years, up until now, in the same PVE cluster, we're just replacing the aging hardware itself, but the Mlx are only 3 years old and therefore still good for the new servers as well. No issue at all with the older servers and their NFS mounting behavior....
Does it make any difference whether I configure the VLAN interface as vlan302[@bond0] or bond0.302?
EDIT: the behavior has been the same on 2 other nodes I've begun replacing, one of them, after yet another reboot, now DOES reach the NFS storage. So it looks like it doesn't fail reliably, in 100% of cases, but only like at least 80%?
Last edited: