Proxmox guest VMs under nested virtualization do not apply DHCPOFFER replies

setuid · Jun 4, 2023

I've read through dozens of similar posts, but none seem to relate to this specific issue, and it's a weird one.

If I install Proxox 7.4VE as a guest VM under an existing VMware environment (on metal), the networking/DHCP/everything in VE works as expected.

If I then install an Ubuntu guest VM inside Proxmox, that guest VM does not apply the DHCPOFFER it receives from the upstream DHCP server, a baremetal router north of all of these systems.

In other words:

No DHCP: VMware (metal) → Proxmox (vmware VM) → Ubuntu (proxmox VM)

However, touching nothing, I can give that same Ubuntu guest VM a static IP on the same /22 network that all of these machines and VMs live on, and that works perfectly fine, no issues. This establishes that both nested virtualization, routing and L2 networking are all working as designed and desired.

In Proxmox, the networking is configured very generically, that is a single physical interface (ens160) and a single Linux bridge (vmbr0) that this test VM lives on. When that VM starts up, it creates a tap (tap100i0) to that bridge (vmbr0).

While attempting to acquire a DHCP lease from inside the Ubuntu guest VM, I can tcpdump that connection ("tcpdump -v -n -e -i vmbr0 port 67 or port 68"). In the VM, it's network interface (ens18) shows the DHCPREQUEST going out, and on the host (Proxmox) side, I can see that traffic exiting the bridge (vmbr0), exiting the Proxmox physical (ens160), reaching the DHCP server (external to the Proxmox VMware host), and then receiving a reply (DHCPOFFER) from that DHCP server, which reaches ens160 → vmbr0... and that's where it stops.

I can see the new IP address in the tcpdump output, as well as the MAC of the Ubuntu guest VM sent in the request and reply, but that reply never gets sent into the tap, and so ens18 inside the guest VM never sees the response, and can't apply the lease being offered.

Before you suggest that something is misconfigured on the parent VMware side, I can create a VMware VM inside the parent VMware (metal) host, and inside that, deploy the same Ubuntu guest VM, and it properly receives DHCP packets in and out of its interface.

In other words:

Working DHCP: VMware (metal) → VMware (nested VM) → Ubuntu (vmware VM)

All of the correct settings are set on the parent (metal) VMware vswitch0, including promiscuous mode, forged transmits and MAC changes.

I've ensured that there are no firewalls enabled at the Proxmox Datacenter, Node or VM level that would prevent DHCP replies from being properly received, and this is confirmed by the above tcpdump outputs on all interfaces along the path.

I also tried explicitly enabling iptables rules inside and outside the Proxmox guest VM, with the same failing results. This included using ufw as well as iptables directly. No luck. Again, assigning a static IP address to the Proxmox guest VM works, and there are no issues. The issue is only with the received DHCP replies being sent into the tap for the VM.

I've also experimented with setting promiscuous mode on the Proxmox physical (ens160), the VM Linux bridge (vmbr0) and that VM's tap (tap100i0) without success either.

I've spent the better part of a full day on this, testing and trying every combination of NIC model, configuration, bridge routing, iptables, sysctl ip_forward, ufw rules and more, without any luck.

The issue can't be related to the VMware baremetal host itself, because the DHCPREQUEST goes out, is received by the upstream DHCP server where a DHCPOFFER is created and then sent back into the Proxmox VM and across the Linux bridge, but never gets received by the tap nor applied by the Proxmox guest VM once received.

Here's an example of the reply I see inside the Proxmox VM, when capturing packets on vmbr0:

Bash:

01:39:07.977907 76:ac:b9:5a:e5:87 > e2:73:ab:d9:21:02, ethertype IPv4 (0x0800), length 359: (tos 0xc0, ttl 64, id 3729, offset 0, flags [none], proto UDP (17), length 345)
    192.168.4.1.67 > 192.168.7.119.68: BOOTP/DHCP, Reply, length 317, xid 0x4cf18f49, secs 28, Flags [none]
      Your-IP 192.168.7.119
      Server-IP 192.168.4.1
      Client-Ethernet-Address e2:73:ab:d9:21:02
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message (53), length 1: Offer
        Server-ID (54), length 4: 192.168.4.1
        Lease-Time (51), length 4: 86400
        RN (58), length 4: 43200
        RB (59), length 4: 75600
        Subnet-Mask (1), length 4: 255.255.252.0
        BR (28), length 4: 192.168.7.255
        Domain-Name-Server (6), length 12: 1.1.1.1
        Default-Gateway (3), length 4: 192.168.4.1

On the proxmox side, the configuration looks like the below. Note that the commented out sections are things I've tried without success. Uncommenting them does not result in a working configuration.

Bash:

auto ens160
iface ens160 inet manual
    # up /sbin/ifconfig ens160 promisc on

auto vmbr0
# iface vmbr0 inet static
iface vmbr0 inet dhcp
        # address 192.168.4.130/22
        # gateway 192.168.4.1
    bridge-ports ens160
    bridge-stp off
        hwaddress ether 00:0c:29:8c:98:55
    # up /sbin/ifconfig vmbr0 promisc on

What else am I missing here?

ctejeda · Oct 7, 2023

I have the same issue. However, for me, my setup works if the vswitch in vmware is configured with a single nic. If I add more that one nic to the vswitch uplink, dhcp stops at vmbr0 on proxmox.

I wonder what a tcpdump on the esxi looks like ?

ctejeda · Oct 11, 2023

After conducting a thorough examination using `tcpdump` and `dropwatch`, I set up VMware Tools on the Proxmox-hosted VM by running `apt-get install open-vm-tools`. Subsequently, I adjusted the settings in my `/etc/network/interfaces` configuration:

```bash
auto lo
iface lo inet loopback

auto ens160
iface ens160 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.4.34/24
gateway 192.168.4.1
bridge-ports ens160
bridge-stp on
bridge-fd 0
bridge-ageing 0
```

To complete the solution, I transitioned my vmnic to the Intel E1000 (e1000) model. Throughout this process, I deeply immersed myself in packet analysis, aligning `dropwatch` packet drops with observations from `tcpdump`.

When you enable bridge-ageing 0, you are essentially directing the bridge to retain MAC addresses indefinitely in the Forwarding Database (FDB). The FDB (or MAC Address Table) is what the bridge uses to determine on which port to forward frames based on their MAC addresses.

If setting bridge-ageing to 0 solved a problem in your network, it indicates that for some reason, the bridge was either:

Forgetting MAC addresses too quickly (before they reappear in normal network traffic), leading to incorrect or inefficient frame forwarding.
Encountering a large number of MAC address changes or movements, causing entries to be aged out and replaced frequently.

slashji · Dec 14, 2023

Soo started my struggels with the EXCACT same issue on the exact same infra architecture and did all the same troubleshooting steps with no luck. Ended up with the same conclusion that vmbr0 just drops/forgets DHCP Offer package. Luckily now some bright minded dude ctejeda got the solution that WORKS! bridge-ageing 0 parameter on vmbr0 adapter!
Cookie for you!

jlauro · Feb 10, 2024

ctejeda said:
If setting bridge-ageing to 0 solved a problem in your network, it indicates that for some reason, the bridge was either:

Forgetting MAC addresses too quickly (before they reappear in normal network traffic), leading to incorrect or inefficient frame forwarding.

Encountering a large number of MAC address changes or movements, causing entries to be aged out and replaced frequently.

Thank you for that, setting bridge-ageing to 0 fixes it.

However, I don't think it's one of those causes (must be possibility 3, etc), especially as this is a setting for the linux bridge and not an external bridge. It's very reproduceable (fails every time on my 10gb bond, and not that many MAC changes going on, etc), and a bridge if it forgets a MAC address, then it should broadcast it everywhere. Also, the problem only happens with DHCP reply packet coming back to the VM that is lost, it made it from the physical ethernet, through the bond#, bond#.vlan and through the vmbr###, only lost every time inside the vm .

More likely something with either buggy acceleration on the physical adapter and mac address getting messed up through the layers, or some other kernel bug in the bridging or bonding or how the vm attaches to the vmbr### interface.

Search

Search

Proxmox guest VMs under nested virtualization do not apply DHCPOFFER replies

setuid

Member

ctejeda

New Member

ctejeda

New Member

slashji

Member

jlauro

Member