Guest VM network not working when host is using balance-rr bonding and 2 active interfaces

john12 · Mar 17, 2023

Very funny / strange / annoying issue I discovered today. Any ideas, suggestions, requests for additional information are very much welcome.
Note: read to the bottom of the thread for details, a possible workaround, and check the risks this workaround entails in this post (more detailed here).

Short description: when the PVE host is using balance-rr bonding mode and both interfaces are connected, the network on the VMs is either not working at all, or not working reliably. Environment description at the bottom of the post.

Working scenarios/details:
- balance-rr bonding works fine for inter-node communication (tested with iperf and VM migrations) and for communication with other nodes (I have some multi-gig adapters on other boxes, connected to the same/different switches; in all scenarios, iperf transfer is in the range of 1.5 - 1.7 Gbps)
- when the PVE host is only connected via a single network interface (either onboard or USB, even as part of a balance-rr bond), everything works
- when the PVE host is using another type of bonding except balance-rr, everything works (tested with balance-alb, balance-tlb)
- VM guests (regardless of the VLAN) work fine in the two scenarios above (single NIC connected, or not using balance-rr): they get a DHCP address in the right network, and they are consistently reachable

Not working scenarios/details:
- when the PVE host is using balance-rr bonding mode and both interfaces are connected, the network access to the VMs is either unreliable (e.g. ssh takes 30 seconds to get in, if it does) or doesn't work at all
- to preempt an obvious question ("why not use balance-alb across the cluster?") - some of the USB adapters don't support it. Bad cheap stuff, I know.
- the VM guests can't get a DHCP IP address when rebooted / lease expires; there is however a request still going to the DHCP server, but no ACK from VM (the below repeats for ~5 minutes; if I take down one of the network interfaces on the PVE host, it suddenly works):

Mar 16 16:09:33 dnsmasq-dhcp[20342]: DHCPDISCOVER(eth0.3) 10.10.3.7 e2:a2:f4:75:a2:a1
Mar 16 16:09:33 dnsmasq-dhcp[20342]: DHCPOFFER(eth0.3) 10.10.3.7 e2:a2:f4:75:a2:a1

PVE environment:
- 3-node PVE cluster (node1/node2/node3), running latest version (apt full-upgrade ran yesterday, non-enterprise license)
- each node has two physical network interfaces (one onboard, one USB), 1 Gbps each
- all network interfaces are connected to the same physical unmanaged switch (this is a homelab setup, not enterprise)
- all network interfaces are detected and connected (lights on etc)
- each node has a bond0 made up of the two interfaces, using balance-rr mode; there are no VLANs or other manually-defined interfaces at PVE level


root@node1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual
iface usb1 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 usb1
        bond-miimon 100
        bond-mode balance-rr

auto vmbr0
iface vmbr0 inet static
        address 10.100.100.100/24
        gateway 10.100.100.1
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
root@node1:~#

root@node1:~# pveversion
pve-manager/7.3-6/723bb6ec (running kernel: 5.15.102-1-pve)

Guest VM environment (call it vm-357):
- VM based on Ubuntu 22.04 cloud-init image with some tweaks; none that I recall (at least) in the networking space, apart from disabling ipv6 (fully disabled across the network)
- single virtual network adapter on each host, using virtio (for some reason I haven't yet investigated, all other adapter types aren't even detected by the image)
- network adapter tagged with VLAN tag 3 and using vmbr0 as bridge


root@node1:~# grep net0 /etc/pve/qemu-server/357.conf
net0: virtio=E2:A2:F4:75:A2:A1,bridge=vmbr0,tag=3
root@node1:~#

root@vm-357:~# ifconfig ens18
ens18: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether e2:a2:f4:75:a2:a1  txqueuelen 1000  (Ethernet)
        RX packets 954  bytes 132240 (132.2 KB)
        RX errors 0  dropped 2  overruns 0  frame 0
        TX packets 24 bytes 8208 (8.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Network environment:
- unmanaged switch plugged into managed switch, where VLANs etc. are defined

john12 · Mar 17, 2023

<deleted initial post - issue still present>
There is nothing in the logs of either source or destination machine when this happens.
Tcpdump from the source machine shows packets going out the right way, but nothing coming back.

Sometimes the situation recovers and then there's a burst of replies (like when you type into an overloaded machine and there's nothing shown on the screen, then it writes everything at once), sometimes it ends with an ssh timeout.
If running the hosts with a single network interface, everything works fine.

john12 · Mar 18, 2023

77 views, no replies. Is this such a special issue that nobody else faced? The only similar thing I found was https://forum.proxmox.com/threads/problem-with-bond-mode-balance-rr.6757/, but that dates back to 2011 - I assume many things have changed since then.

Empirical evidence tends to suggest the failure of the guest network is triggered by higher network activity.
A freshly-deployed Ubuntu VM was reachable without issues for hours, until I ran an "apt upgrade -y" - when it promptly became unreachable. Taking down one of the interfaces on the host recovered the situation immediately.

john12 · Mar 18, 2023

Further testing to narrow this down - it is not the switch's fault.
I tested this connecting one of the hosts to a managed switch. I tried with LAG on the two ports both enabled and disabled - same effect, same behavior as when connected to the unmanaged switch.

Basically, with both physical interfaces under bond0 (balance-rr mode) enabled:
- everything works fine at host level
- VMs either can't get an IP address when booting (there's a DHCPREQ and a DHCPOFFER, but no DHCPACK) or have intermittent / complete loss of connectivity
Situation goes back to normal if I disable / unplug either of the physical interfaces which are part of bond0.

john12 · Mar 19, 2023

Although it feels like an echo chamber, further test results: the situation is exactly the same when trying to deploy an Ubuntu Live (regular ISO, full install) VM, with either vmxnet3 or virtio adapters. This rules out something dodgy with the cloud-init Ubuntu image, so I guess the issue is somewhere at PVE bridge level - but can't quite put my finger on it, as it seems to work absolutely fine for the host-level communication.

dignus · Mar 19, 2023

Had something similar in the past, fix for me was to update firmware on the nic’s.

dignus · Mar 19, 2023

Just looked it up, iterally had the same thing, but with active/backup bond mode. These were Intel 10 gbit fiber nic’s on ubuntu 1804 back in the day. No network in the vm layer possible until firmware upgrade.

john12 · Mar 19, 2023

Thanks a lot dignus for your response (although it doesn't seem to be necessarily an easy fix).
My NICs are onboard/USB 1 Gbps, nothing too fancy (and unlikely to have even the capability to update firmware.

If you happen to recall, in your case did it work (for the guests) with one NIC unplugged or you weren't getting any connectivity in the guest VMs until you did the firmware upgrade?

dignus · Mar 19, 2023

Sorry, can't help you there, never tested it with one cable/switchport offline. I didn't get a single packet before the firmware upgrade, nothing in tcpdump even.

john12 · Mar 20, 2023

Thanks nevertheless. Sounds like a different issue in that case, I guess.

Fatz · Mar 20, 2023

Hi John,

I believe this might be a problem in the .102 kernel/packages of Proxmox.

I currently have 2 machines with identical specs, except the CPU (EPYC 32c vs EPYC 64c). Same motherboard, same NICs in the same PCIe slots, etc. All on the same firmware.

Machine #1 - Working for weeks, no problems.
Machine #2 - Installed at the weekend, not working if interfaces are bonded.

I spent most of yesterday trying to work out why an identical interfaces config was not working on the 2nd machine. Then i realised it DID work, but only after initial installation. By this, i mean installing from the 7.3.1 ISO, which puts you on the 5.15.74-1-pve kernel. But, updating from the free repo puts you on .102.

The 1st machine, which works fine, is on .85. After speaking to a friend who had an Enterprise subscription, he told me the latest packages on there also bring him up to .85. So i decided to get a key for myself today, since i had planned it for these 2 machines after i was done testing/setting them up anyway. And i can confirm that this 2nd machine with the subscription added, is now on.85 and the config from machine #1 (only with a different management IP) works fine.

Prior to this, any attempt to bond 2 interfaces was causing issues. Whether done manually or through the UI. It seems to cause the system to hang no matter if you use ifreload -a or service networking restart. And of course, rebooting would leave you stuck with the unlimited timeout of "intializing network". This is whether i used LACP or active-backup.

Here is the current config for each machine, which uses 1x dual NIC for traffic, and 1x dual NIC for management, just for the sake of clarity

Code:

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

auto enp130s0f0
iface enp130s0f0 inet manual

auto enp130s0f1
iface enp130s0f1 inet manual

auto enp65s0f0
iface enp65s0f0 inet manual

auto enp65s0f1
iface enp65s0f1 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp130s0f0 enp130s0f1
        bond-miimon 100
        bond-mode active-backup
        bond-primary enp130s0f0
#Bond: Management

auto vmbr0
iface vmbr0 inet static
        address 10.0.20.104/24
        gateway 10.0.20.1
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 20
#Bridge: Management

auto bond1
iface bond1 inet manual
        bond-slaves enp65s0f0 enp65s0f1
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3
#Bond: VM Traffic

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 30 100-299
#Bridge: VM Traffic

Maybe you could try testing on an older version to see if you also have issues? Though i did notice that you mentioned:

"running latest version (apt full-upgrade ran yesterday, non-enterprise license)"
"annoying issue I discovered today"

... which makes me think it's likely that you're also seeing issues from moving up to the latest free-repo packages.

john12 · Mar 20, 2023

Wow. Thanks @Fatz for the detailed reply - and for a fantastic suggestion!
I did try kernel 6.2 yesterday, but it made no difference. However...

I moved one host today to 5.15.85-1, following your suggestion, and the situation has improved MASSIVELY:
- the DHCP acknowledgment still doesn't work when both NICs are up (there's DHCPREQ and DHCPOFFER, but no DHCPACK); if I take down one interface, it goes through. Workaround: set VM to static IP address
- once the machine is booted, enabling the second host interface doesn't have any negative effects (with 5.15.102-1, the machine would become unreachable, commands would hang halfway through typing etc.). I only noticed one hiccup lasting ~1 second, but then everything was back to normal
- for the fun of it, I also tried 5.15.74-1; behavior was similar to .85, but I had more hiccups (nevertheless, the machine remained reachable once it got past DHCP)
With the new setup, I am getting ~1.4 Gbps with iperf (2 streams) from a different host in the network to the VMs.

Shall I raise it as a bug somewhere? Not sure if I should mark this as "Solved" yet, since we are talking about a couple of workarounds.

In case anyone is wondering, moving to the prior kernel is easy and reversible (as long as you still have it installed) even on a headless machine (no monitor/keyboard):
1. Check that grub is set to boot the first kernel in the list


root@node1:~# grep GRUB_DEFAULT /etc/default/grub
GRUB_DEFAULT=0
root@node1:~#

2. Move all files for the latest kernel out of the way (there's 4 files for each kernel you can actually move more than one set of files, as long as you have at least one full set left), update grub config and reboot:


root@node1:~# mv /boot/*5.15.102* /root/backup-kernel/
root@node1:~# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.85-1-pve
Found initrd image: /boot/initrd.img-5.15.85-1-pve
Found linux image: /boot/vmlinuz-5.15.74-1-pve
Found initrd image: /boot/initrd.img-5.15.74-1-pve
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
root@node1:~# reboot

The machine will boot the first kernel listed in the update-grub output above (5.15.85-1 in my case).

Rolling back (re-enabling the latest kernel(s)) is also easy: copy the corresponding files back to /boot from wherever you moved them, update-grub, reboot.

Fatz · Mar 20, 2023

Glad my post was useful. I'm just hoping this is tested and solved before whatever is causing it ends up in the Enterprise repo.

john12 said:
Shall I raise it as a bug somewhere? Not sure if I should mark this as "Solved" yet, since we are talking about a couple of workarounds.

Yes i was wondering the same thing. This is my first time posting here and i'm not aware of the procedure for getting a report in. I wanted to do a little bit more testing myself before reporting something more officially though, since i only came across this bug while trying to get LACP working on my management VLAN/bond, and wanted to check (later) that this also still works as intended.

fabian · Mar 20, 2023

yeah, please file an issue at bugzilla.proxmox.com

include:
- pveversion -v (broken and working)
- /etc/network/interfaces
- lspci -v
- journalctl -b (broken boot and working boot)
- description of symptoms

thanks!

VictorSTS · Mar 20, 2023

john12 said:
In case anyone is wondering, moving to the prior kernel is easy and reversible (as long as you still have it installed) even on a headless machine (no monitor/keyboard):

This is easier pinning the kernel with proxmox-boot-tool.

BTW I'm following this thread as I'm curious about what could be the problem of this strange issue. I never use balance-rr as it has always given me poor performance (specially with 3+ links) and never made it work for UDP traffic like VoIP. I wouldn't rule out the switch just yet, as all nics in the bond get their MAC changed, RX traffic is usually sent back to the bond by just one link at some given time. Of course, it depends on your harware. Try to get a manageable switch and setup LACP. Balance-rr is the only bonding type that will send a single connection among 2+ links, though.

john12 · Mar 20, 2023

VictorSTS said:
Try to get a manageable switch and setup LACP. Balance-rr is the only bonding type that will send a single connection among 2+ links, though.

I did - although only tested with 5.15.102-1 (this was before @Fatz groundbreaking advice above), no difference.
Granted, the switch isn't too smart - only one generic type of LAG, no configurable parameters - and I only tested it with balance-rr on the host side.

For now, it seems to work as it is. I am getting ~1.5-1.7 Gbps host-to-host, about ~1.3-1.5 Gbps host-to-VM.
I am happy with it, considering one of the NICs is a USB thing. And I can live with the downside of DHCP not working, as I don't have that many VMs (nor a lot of fluctuation among them). Still very curious why it isn't working with the latest kernel...

Fatz · Mar 21, 2023

VictorSTS said:
Try to get a manageable switch and setup LACP

I can confirm this happened all the same for me on .102 with LACP or active-backup modes on any bonded links with a L3 switch on the other end. But on .85 there are no such issues. I'll go submit a bug report about this, as i'd rather ensure that this problem didn't make its way to me via the Enterprise repo in the near future.

Fatz · Mar 21, 2023

Post on Bugzilla, for those interested:
https://bugzilla.proxmox.com/show_bug.cgi?id=4604

john12 · Mar 21, 2023

@Fatz, can you check what happens after you do "brctl setageing vmbr0 0" on the respective bridge?
I did that on my box and my troublesome VM now gets an IP address via DHCP (while plugged into a vmbr0 which sits on top of a balance-rr bond).

Idea from https://forum.proxmox.com/threads/two-nics-two-bridges-why-does-this-not-work.124228/

john12 · Mar 21, 2023

Further follow-up: with the above fix made permanent, in the form of (check bridge_ageing, as I can't make it bold in a code block):

Code:

auto vmbr0
iface vmbr0 inet static   
        address 10.100.100.100/24
        gateway 10.100.100.1
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        bridge_ageing 0

I am happy to report that EVERYTHING WORKS (sorry for the caps, I am just too happy to see the end of this).

I tested this on all my hosts, with both 5.15.85-1 and 5.15.102-1 - no difference:
- bonding works (I get ~1.5 Gbps when communicating with VMs, a bit more in host-to-host traffic)
- VMs get an IP over DHCP (wasn't working in either .85 or .102)
- no more hiccups / loss of connectivity when connecting to VMs (made things unusable in .102, was rare in .85)

I'll mark this as Solved, since the one parameter above appears to address the impact and not cause any side effects (at least my switch didn't catch fire yet ). Time permitting, I will also try to understand why this fixes the thing, unless someone has an easy explanation.
Not solved for now, as it does have some side effects - see here.

Guest VM network not working when host is using balance-rr bonding and 2 active interfaces

New Member

New Member

New Member

New Member

New Member

Renowned Member

Renowned Member

New Member

Renowned Member

New Member

New Member

New Member

New Member

Proxmox Staff Member

Famous Member

New Member

New Member

New Member

New Member

New Member