Networking not working on nested VM

Tony C

Member
May 28, 2020
29
2
23
Hia all,

I have a VM with the host CPU setting enabled and inside that VM I'm running VirtualBox and VMwarePlayer with VMs that use bridged networking. This is for testing Ansible configurations under different virtualisation environments. Both the guest OS and the guest-guest OS (if you get my drift!) are Ubuntu 18.04 LTS. When the outer VM, the one with the CPU: Host setting is on the main bridged network everything works as expected and you can see the guest and guest-guest VMs as separate entities on the network. However when the outer guest VM is moved onto an internal bridged network, which has working DDNS and DHCP, the guest-guests don't show up. All other VMs on that network behave as expected.

I'm running version 6.3-3 and my network config is:

Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp3s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.10/24
        gateway 192.168.1.1
        bridge-ports enp3s0
        bridge-stp off
        bridge-fd 0
#Bridged network.

auto vmbr1
iface vmbr1 inet static
        address 10.100.200.1/24
        bridge-ports none
        bridge-stp off
        bridge-fd 0
        post-up echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up iptables -t nat -A POSTROUTING -s '10.100.200.0/24' -o vmbr0 -j MASQUERADE
        post-down iptables -t nat -D POSTROUTING -s '10.100.200.0/24' -o vmbr0 -j MASQUERADE
#NAT network.

auto vmbr2
iface vmbr2 inet manual
        bridge-ports none
        bridge-stp off
        bridge-fd 0
#Internal network.

auto vmbr3
iface vmbr3 inet manual
        bridge-ports none
        bridge-stp off
        bridge-fd 0
#Test network.

So if the guest VM is on Bridged Network, the guest-guest VMs show up, if the guest is on Test network they don't.

Is this too much for an internal bridged network, i.e. it needs physical hardware to work, or am I missing some magic config?

MTIA, Tony.
 
Both the guest OS and the guest-guest OS (if you get my drift!)
Usually referred to as L1 and L2 guest, where "L0" would be the hypervisor - just for reference.

So if the guest VM is on Bridged Network, the guest-guest VMs show up, if the guest is on Test network they don't.

Is this too much for an internal bridged network, i.e. it needs physical hardware to work, or am I missing some magic config?
No, this should work. So if I understand correctly, if you put your L1 on vmbr0, then external devices (so on other physical machines on the same network, as well as PVE itself and other VMs on vmbr0) can connect to the L2, but if you put L1 on vmbr3, only L1 itself shows up to other guests on vmbr3, and L2 does not? You do not change L2 configuration for that?

Only thing I could think of would be some MAC address confusion... are you using the PVE (or any other) firewall perhaps, with MAC filtering enabled?

In general, a good tip I like to use is tcpdump -i any -vv icmp on all machines in the chain and then try some pings to see where in the chain the packets get lost.
 
Firstly sorry about the late reply. Life took over! :-(. BTW many thanks for the tip about L0-2, much simpler!

Anyway I have done some further investigating...

In fact I can successfully ping the L2 VM from any of my networks but, for example, sshing in from a PVE L1 VM doesn't work, nor does nmap -sT. Basically it seems that if you try and connect to the L2 VM from an L1 VM on the same network or another hosted by the L0 PVE server then it doesn't work. Connecting from a bridged VM on another machine, my desktop in this case, such that the connection comes in on the vmbr0 interface, then stuff works as expected.

I tried turning on logging in iptables and turned off the IP routing and NATting on vmbr1 and this made no difference nor did anything show up in the logs that would indicate what was wrong.
 
IP forwarding (sysctl net.ipv4.ip_forward) should probably be turned on for the L0 and L1 hypervisors. Then I'd check the routing and ARP tables (ip route; ip neighbor) on relevant devices and VMs - I suspect there is some confusion on your (nested) hypervisor as to which interface the packets should be routed to. If your L2 VMs have IP addresses in the same range as the rest of your network, then the L1 (or L0, depending on where your "private" vmbr is located) instances have neighbors in the same subnet on two different bridges.

If nothing else, the tcpdump command I mentioned still seems like a viable route for debugging.
 
Hello again,

I have an apology to make, I didn't mention what type the L1 hypervisors were :-(.

I tried the tcpdump command but it didn't show anything odd as ping worked, I just couldn't ssh into it unless it was from another real machine. Likewise ip route and ip neighbor showed nothing unusual and turning on IP forwarding in L1 made no difference (although because UDP works I don't think it's a routing issue - see below).

The reason why I'm nesting hypervisors is to be able to run different types on my PVE setup for testing purposes (making sure Ansible scripts set up the paravirtualisation drivers correctly etc) and to run older Windows VMs that can't be ported across to PVE from VirtualBox (MS-Windows issues not PVE). So the L0 hypervisor is PVE and the L1 ones are VirtualBox and VMware-Player (in different Ubuntu 18.04 VMs). I tried a L1 PVE and then copied the VirtualBox VM across to it and ran that VM in side of the nested PVE instance, i.e. PVE(L0) -> PVE(L1) -> VirtualBox(L2) -> Test VM (TVM). This crazy arrangement worked, albeit very slowly, and I could ssh into L1, L2 and TVM from anywhere. Also when I do the equivalent with VMware Fusion, i.e. VMware(L0) -> VirtualBox(L1) -> Test VM (TVM), it also works as expected.

TCP is the thing that is broken, using nc -u/nc -ul works from a machine that can't do nc <hostname> <port>. No firewalls are involved either on the hosts nor the VMs etc.

Any ideas?
 
If you run 'tcpdump' on your L1 and L0, and then try to SSH in, do you see a valid TCP packet arrive at each? What about an answer? When you run tcpdump with the '-vv' flag (very verbose), you'll see a "Flags [...]" entry for each packet, check what it says there - i.e. Flags [S] signifies the SYN at connection open, if you see this and nothing else the packet is probably dropped on the destination (or L1 if you only see it on L0), if you don't even get that it is likely dropped at the return path.

The setup as you describe it should work fine... Been a while since I personally used VirtualBox, but have you tried the NAT networking mode as well? Oh and keep in mind that some versions of Ubuntu like to enable the 'ufw' firewall by default, though I suppose if you say it works with a different hypervisor that seems unlikely to be it either.
 
Yup no firewalls in the way. It's weird. Basically when VM running on the same L0 PVE server tries to connect in you can see it sending the initial SYN packet, 2 TCP retransmits and then the originating VM trying to find the mac address for the L2 VM with a couple of ARP requests (which makes sense after 2 retransmits failed). Nothing comes back. Likewise on the L2 VM you see the same incoming packets that were sent out by the originating VM but no replies at all. All nodes had the correct ARP entries in their caches and the same routing tables. I was filtering on the IP for the L2 VM. I'll try widening that filter on the L2 VM and see if there's anything else.

Oh BTW yes I did try NAT networtking for the L2 VM with port forwarding and that does work.
 
Update: I have tested this on a `bare' KVM/QEMU setup, no proxmox software involved, and it suffers from exactly the same issue. I shall raise this with the kvm/qemu team.

Many thanks for you help and sorry to take up your time. If I/someone else gets to the bottom of this I'll of course post the answer back here for reference.
 
Update: I have tested this on a `bare' KVM/QEMU setup, no proxmox software involved, and it suffers from exactly the same issue. I shall raise this with the kvm/qemu team.

Many thanks for you help and sorry to take up your time. If I/someone else gets to the bottom of this I'll of course post the answer back here for reference.
Hi, did you ever figure this out? I'm experiencing the same thing between a Win11 VM and guests within a GNS3VM. Ping works, but I'm unable to establish a TCP connection, however everything works fine from outside proxmox (eg my laptop).

Using tcpdump I can see that the GNS3VM L2 guest sees the SYN and subsequent retransmissions but an expected SYN-ACK doesn't show up at the other L1 guest.

I have eno1, eno2, eno3, eno4 in bond0 which is the sole member of vmbr0. I haven't gone through permutations of networking yet.
 
Hi, did you ever figure this out? I'm experiencing the same thing between a Win11 VM and guests within a GNS3VM. Ping works, but I'm unable to establish a TCP connection, however everything works fine from outside proxmox (eg my laptop).

Using tcpdump I can see that the GNS3VM L2 guest sees the SYN and subsequent retransmissions but an expected SYN-ACK doesn't show up at the other L1 guest.

I have eno1, eno2, eno3, eno4 in bond0 which is the sole member of vmbr0. I haven't gone through permutations of networking yet.
Sorry I took a very long time to reply, been moving house, selling the old family home and re-setting up my home lab etc...

Anyway I tried the exact same experiment on PVE version 8.1 and had the exact same issue. However this also occurred on a plain KVM/QEMU install on regular Ubuntu and so it's probably really something to be raised with that team. I haven't raised it with them yet as it wasn't a show stopper for me, more a puzzlement. Sorry this doesn't really help.
 
Last edited:
Hi, did you ever figure this out? I'm experiencing the same thing between a Win11 VM and guests within a GNS3VM. Ping works, but I'm unable to establish a TCP connection, however everything works fine from outside proxmox (eg my laptop).

Using tcpdump I can see that the GNS3VM L2 guest sees the SYN and subsequent retransmissions but an expected SYN-ACK doesn't show up at the other L1 guest.

I have eno1, eno2, eno3, eno4 in bond0 which is the sole member of vmbr0. I haven't gone through permutations of networking yet.
I know this was a very long time ago, but did you figure it out? I am having the same issue.
 
I know this was a very long time ago, but did you figure it out? I am having the same issue.
No not yet. There is that horrible work around of nested PVE/QEMU instances, see above. Basically it seems to work for normal bridged interfaces but not for internal ones. When I get an answer from the kvm/qemu team I'll post back here.
 
No not yet. There is that horrible work around of nested PVE/QEMU instances, see above. Basically it seems to work for normal bridged interfaces but not for internal ones. When I get an answer from the kvm/qemu team I'll post back here.
Thanks! Will be looking forward for an answer, if there ever is one. Could you please share the "horrible work around", could find it in the above posts.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!