[SOLVED] VMs inside an internal network (bridge) lose Internet connection

Flight6334

New Member
Mar 30, 2024
6
2
3
Hi,

I've searched a lot of issues on many layers, there and there, and I think it may come from Proxmox.
I'm using Alpine VMs to host some docker services, and for some for them, they may lose connection to the Internet, and coming back after a while, for no apparent reasons.
The concrete result is having a "Bad Gateway" error on my browser, but when I looked the log of my Plex VM for example, it says it "couldn't resolve host name) (Could not resolve host: plex.tv)". So, I go to the VM shell, try a nslookup google.com and I got "connection refused", "no servers could be reached". Not all VMs are impacted, only Plex and Nextcloud struggle. I can access, for instance, my Firefly III as the same moment Plex is down.

To make things a bit more clear about my network typology, it is:
Code:
Internet <> ISP <> PVE (LAN)
                            <> Traefik (a VM on LAN)
                            <> OPNSense (a VM on LAN too) <> Virtual network <> Nextcloud VM / Plex VM / Firefly VM


Capture d’écran 2024-04-10 à 22.40.06.png
Typical example. I didn't changed anything between the two commands. Just let the time does.

What I use:
  • Proxmox, latest version.
  • Alpine OS for VMs, latest version.
  • OPNsense for the router between my physical LAN and my virtual network. It likely not comes from this part because I tried pfSense before, reinstalled it 3 times, and I got the same results. When I read logs, connections from my LAN to virtual network are accepted, from both way.
  • Traefik as reverse proxy.

What I'm sure:
  • it doesn't come from the firewall, it is disabled for the tests
  • it doesn't come from my NIC, because the bridge is not connected to any physical NIC.
  • it doesn't come from my DNS, because while some VMs have "connection refused", others have their responses.
  • it isn't an IP address problem, none of my VMs share the same IP address.
  • it doesn't come from Traefik (at least the core working), because it works for my VMs on my LAN and some VMs on the virtual network.

Thanks for your help
 
Last edited:
Hi,

  • it doesn't come from my DNS, because while some VMs have "connection refused", others have their responses.
This looks DNS resolution issues. I would check the DNS config, and the `/etc/resolv.conf` in the affected VM, plus the syslog inside the affected VM, as well as the network traffic. This information should give you/us the root cause of the issue.
 
I kinda resolved the DNS problem by just restarting my OPNsense router but the problem persists.
I still can't, reliably having a functioning home lab with Proxmox.

Connection is going down, randomly, whatever the service is. Any service can be down, make a comeback, going down again. I'm running out of ideas. Yet my organization and topology are extremely simple.
 
I was able to solve it by enabling IP forward on Traefik and by putting my machine's router to OPNsense IP and not my ISP.
 
  • Like
Reactions: Moayad