I've read through dozens of similar posts, but none seem to relate to this specific issue, and it's a weird one.
If I install Proxox 7.4VE as a guest VM under an existing VMware environment (on metal), the networking/DHCP/everything in VE works as expected.
If I then install an Ubuntu guest VM inside Proxmox, that guest VM does not apply the DHCPOFFER it receives from the upstream DHCP server, a baremetal router north of all of these systems.
In other words:
No DHCP: VMware (metal) → Proxmox (vmware VM) → Ubuntu (proxmox VM)
However, touching nothing, I can give that same Ubuntu guest VM a static IP on the same /22 network that all of these machines and VMs live on, and that works perfectly fine, no issues. This establishes that both nested virtualization, routing and L2 networking are all working as designed and desired.
In Proxmox, the networking is configured very generically, that is a single physical interface (ens160) and a single Linux bridge (vmbr0) that this test VM lives on. When that VM starts up, it creates a tap (tap100i0) to that bridge (vmbr0).
While attempting to acquire a DHCP lease from inside the Ubuntu guest VM, I can tcpdump that connection ("tcpdump -v -n -e -i vmbr0 port 67 or port 68"). In the VM, it's network interface (ens18) shows the DHCPREQUEST going out, and on the host (Proxmox) side, I can see that traffic exiting the bridge (vmbr0), exiting the Proxmox physical (ens160), reaching the DHCP server (external to the Proxmox VMware host), and then receiving a reply (DHCPOFFER) from that DHCP server, which reaches ens160 → vmbr0... and that's where it stops.
I can see the new IP address in the tcpdump output, as well as the MAC of the Ubuntu guest VM sent in the request and reply, but that reply never gets sent into the tap, and so ens18 inside the guest VM never sees the response, and can't apply the lease being offered.
Before you suggest that something is misconfigured on the parent VMware side, I can create a VMware VM inside the parent VMware (metal) host, and inside that, deploy the same Ubuntu guest VM, and it properly receives DHCP packets in and out of its interface.
In other words:
Working DHCP: VMware (metal) → VMware (nested VM) → Ubuntu (vmware VM)
All of the correct settings are set on the parent (metal) VMware vswitch0, including promiscuous mode, forged transmits and MAC changes.
I've ensured that there are no firewalls enabled at the Proxmox Datacenter, Node or VM level that would prevent DHCP replies from being properly received, and this is confirmed by the above tcpdump outputs on all interfaces along the path.
I also tried explicitly enabling iptables rules inside and outside the Proxmox guest VM, with the same failing results. This included using ufw as well as iptables directly. No luck. Again, assigning a static IP address to the Proxmox guest VM works, and there are no issues. The issue is only with the received DHCP replies being sent into the tap for the VM.
I've also experimented with setting promiscuous mode on the Proxmox physical (ens160), the VM Linux bridge (vmbr0) and that VM's tap (tap100i0) without success either.
I've spent the better part of a full day on this, testing and trying every combination of NIC model, configuration, bridge routing, iptables, sysctl ip_forward, ufw rules and more, without any luck.
The issue can't be related to the VMware baremetal host itself, because the DHCPREQUEST goes out, is received by the upstream DHCP server where a DHCPOFFER is created and then sent back into the Proxmox VM and across the Linux bridge, but never gets received by the tap nor applied by the Proxmox guest VM once received.
Here's an example of the reply I see inside the Proxmox VM, when capturing packets on vmbr0:
On the proxmox side, the configuration looks like the below. Note that the commented out sections are things I've tried without success. Uncommenting them does not result in a working configuration.
What else am I missing here?
If I install Proxox 7.4VE as a guest VM under an existing VMware environment (on metal), the networking/DHCP/everything in VE works as expected.
If I then install an Ubuntu guest VM inside Proxmox, that guest VM does not apply the DHCPOFFER it receives from the upstream DHCP server, a baremetal router north of all of these systems.
In other words:
No DHCP: VMware (metal) → Proxmox (vmware VM) → Ubuntu (proxmox VM)
However, touching nothing, I can give that same Ubuntu guest VM a static IP on the same /22 network that all of these machines and VMs live on, and that works perfectly fine, no issues. This establishes that both nested virtualization, routing and L2 networking are all working as designed and desired.
In Proxmox, the networking is configured very generically, that is a single physical interface (ens160) and a single Linux bridge (vmbr0) that this test VM lives on. When that VM starts up, it creates a tap (tap100i0) to that bridge (vmbr0).
While attempting to acquire a DHCP lease from inside the Ubuntu guest VM, I can tcpdump that connection ("tcpdump -v -n -e -i vmbr0 port 67 or port 68"). In the VM, it's network interface (ens18) shows the DHCPREQUEST going out, and on the host (Proxmox) side, I can see that traffic exiting the bridge (vmbr0), exiting the Proxmox physical (ens160), reaching the DHCP server (external to the Proxmox VMware host), and then receiving a reply (DHCPOFFER) from that DHCP server, which reaches ens160 → vmbr0... and that's where it stops.
I can see the new IP address in the tcpdump output, as well as the MAC of the Ubuntu guest VM sent in the request and reply, but that reply never gets sent into the tap, and so ens18 inside the guest VM never sees the response, and can't apply the lease being offered.
Before you suggest that something is misconfigured on the parent VMware side, I can create a VMware VM inside the parent VMware (metal) host, and inside that, deploy the same Ubuntu guest VM, and it properly receives DHCP packets in and out of its interface.
In other words:
Working DHCP: VMware (metal) → VMware (nested VM) → Ubuntu (vmware VM)
All of the correct settings are set on the parent (metal) VMware vswitch0, including promiscuous mode, forged transmits and MAC changes.
I've ensured that there are no firewalls enabled at the Proxmox Datacenter, Node or VM level that would prevent DHCP replies from being properly received, and this is confirmed by the above tcpdump outputs on all interfaces along the path.
I also tried explicitly enabling iptables rules inside and outside the Proxmox guest VM, with the same failing results. This included using ufw as well as iptables directly. No luck. Again, assigning a static IP address to the Proxmox guest VM works, and there are no issues. The issue is only with the received DHCP replies being sent into the tap for the VM.
I've also experimented with setting promiscuous mode on the Proxmox physical (ens160), the VM Linux bridge (vmbr0) and that VM's tap (tap100i0) without success either.
I've spent the better part of a full day on this, testing and trying every combination of NIC model, configuration, bridge routing, iptables, sysctl ip_forward, ufw rules and more, without any luck.
The issue can't be related to the VMware baremetal host itself, because the DHCPREQUEST goes out, is received by the upstream DHCP server where a DHCPOFFER is created and then sent back into the Proxmox VM and across the Linux bridge, but never gets received by the tap nor applied by the Proxmox guest VM once received.
Here's an example of the reply I see inside the Proxmox VM, when capturing packets on vmbr0:
Bash:
01:39:07.977907 76:ac:b9:5a:e5:87 > e2:73:ab:d9:21:02, ethertype IPv4 (0x0800), length 359: (tos 0xc0, ttl 64, id 3729, offset 0, flags [none], proto UDP (17), length 345)
192.168.4.1.67 > 192.168.7.119.68: BOOTP/DHCP, Reply, length 317, xid 0x4cf18f49, secs 28, Flags [none]
Your-IP 192.168.7.119
Server-IP 192.168.4.1
Client-Ethernet-Address e2:73:ab:d9:21:02
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message (53), length 1: Offer
Server-ID (54), length 4: 192.168.4.1
Lease-Time (51), length 4: 86400
RN (58), length 4: 43200
RB (59), length 4: 75600
Subnet-Mask (1), length 4: 255.255.252.0
BR (28), length 4: 192.168.7.255
Domain-Name-Server (6), length 12: 1.1.1.1
Default-Gateway (3), length 4: 192.168.4.1
On the proxmox side, the configuration looks like the below. Note that the commented out sections are things I've tried without success. Uncommenting them does not result in a working configuration.
Bash:
auto ens160
iface ens160 inet manual
# up /sbin/ifconfig ens160 promisc on
auto vmbr0
# iface vmbr0 inet static
iface vmbr0 inet dhcp
# address 192.168.4.130/22
# gateway 192.168.4.1
bridge-ports ens160
bridge-stp off
hwaddress ether 00:0c:29:8c:98:55
# up /sbin/ifconfig vmbr0 promisc on
What else am I missing here?
Last edited: