Random Proxmox Connectivity Issues (GUI+SSH) + Guest Downtime After Fix

hyperXL99

Member
Mar 12, 2022
13
0
6
23
Hi everyone,

I’m facing a strange and frustrating issue with my Proxmox setup and could really use some help. Here’s the situation:

• Randomly, Proxmox becomes unreachable via web GUI or SSH, but all the guests are still running and reachable without any issues.
• When this happens, my only “solution” is to blindly type systemctl restart networking (I plug in a USB keyboard since I don’t have a monitor connected). After that, I regain access to the Proxmox web UI and SSH.
• However, all guests are offline after restarting networking, so I have to manually restart them one by one. Once that’s done, everything works fine again—until it happens randomly later.
• If I try rebooting the whole system, even the guests don’t come back online after the reboot.

Clearly, just cronjobbing a networking restart isn’t a viable solution because of the guest downtime.

My Setup:

• Proxmox Version: 8.3.0, non-enterprise repo
• AMD Ryzen 7 5700G
• 64GB RAM
• ASUS ROG Strix B550-F Gaming Mainboard
• NIC: Motherboard’s built-in LAN
Storage:
• Proxmox installed on a 1TB SSD
• Two RAID1 HDDs (8TB each)
• Additional 1TB HDD
• PSU: Seasonic Prime TX-650
• Almost-new hardware (<4 months old).

I’ve searched around and tried to troubleshoot, but I’m at a loss as to why this is happening. Could it be a driver issue with the NIC? A misconfiguration in Proxmox networking? Something with power management on the NIC or the motherboard?

Any Ideas?

I’d appreciate any help, pointers, or ideas to narrow this down! Let me know if you need any logs or additional details—I’ll do my best to gather them.

Thanks in advance!
 
Here is my interface config /etc/network/interfaces:

Code:
auto lo
iface lo inet loopback

iface enp5s0 inet manual

#iface enp0s25 inet manual

#iface enp8s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.178.21/24
        gateway 192.168.178.1
        bridge-ports enp5s0
        bridge-stp off
        bridge-fd 0

auto vmbr1
iface vmbr1 inet static
        address 10.27.0.0/16
        bridge-ports none
        bridge-stp off
        bridge-fd 0
 
Hello, I'm having the same issue. The GUI+ SSH is not accessible after approximately 20-22 days of uptime. The clients are working fine though an last time this happened, the last time when this happened, after reboot I checked the syslog and no pointers to network issues were there. Any thoughts or something I can provide to troubleshoot this? It's becoming a bit annoying to have to hard-reset the entire system just to be able to access the GUI
 
Hello everyone, I have been having the same issue since the upgrade to 8.3. The Web Gui does and does not work and it goes back and forth like that. When it's accessible, I can use for a minute sometimes or more and other times just a few seconds. It's very inconsistent. SSH access follows the same pattern. Keep in mind that I do not see a network disconnect because my continuous pings to the pve do not reflect any disconnect during those times UI and SSH do not work.

I have tried to revert the kernel back and while it seemed to have worked at first, it looks like it was just apparent. Problem is still there just very inconsistent. It feels almost as whichever service runs the Web Gui, crashes but I cannot explain the coming back on its own without me doing anything.

Any new clues? Will try to collect log if it stays on long enough.

EDIT: I confirm also that the clients and LXCs are running and are reachable via RDP or etc. during the time that Web GUI is not working.
 
Last edited:
Hello everyone, I have been having the same issue since the upgrade to 8.3. The Web Gui does and does not work and it goes back and forth like that. When it's accessible, I can use for a minute sometimes or more and other times just a few seconds. It's very inconsistent. SSH access follows the same pattern. Keep in mind that I do not see a network disconnect because my continuous pings to the pve do not reflect any disconnect during those times UI and SSH do not work.

I have tried to revert the kernel back and while it seemed to have worked at first, it looks like it was just apparent. Problem is still there just very inconsistent. It feels almost as whichever service runs the Web Gui, crashes but I cannot explain the coming back on its own without me doing anything.

Any new clues? Will try to collect log if it stays on long enough.

EDIT: I confirm also that the clients and LXCs are running and are reachable via RDP or etc. during the time that Web GUI is not working.
I have this problem too since updating. And yeah the guests work fine without any problem.

My "fix" atm is to blindly type "systemctl restart networking" on an attached keyboard (blindly login first) when I want to use the UI or SSH and with that fix it works again for a few minutes (tho the guests need to be restarted afterwards as well).

As a restart in networking always fixes the problem it is unlikely that some pve service is crashing but I am also clueless on what it can be.
 
That's the thing, it's not a complete network problem because the clients still work, and ping tests show no indication of network problems. What's interesting is that your network config in /etc/network/interfaces is very similar to mine. I also have the two interfaces vmbr0 and vmbr1 and subnets very similar as well (vmbr0: 192.168.1.xxx - vmbr1: 10.0.0.xxx). I am out of town at the moment but when I can, I'll copy the config here.

I use the vmbr1 for internal traffic only and vmbr0 bridged with the home network (which is 192.168.1.xxx). The VMs are on the vmbr1 only and I use pfsense virtual FW with two interfaces and do all the routing etc.

I wonder if by removing vmbr1 interface makes any difference.

"systemctl restart networking" is not a fix, not even a workaround as it is just a matter of time and it will go down again.
 
  • Like
Reactions: hyperXL99
That's the thing, it's not a complete network problem because the clients still work, and ping tests show no indication of network problems. What's interesting is that your network config in /etc/network/interfaces is very similar to mine. I also have the two interfaces vmbr0 and vmbr1 and subnets very similar as well (vmbr0: 192.168.1.xxx - vmbr1: 10.0.0.xxx). I am out of town at the moment but when I can, I'll copy the config here.

I use the vmbr1 for internal traffic only and vmbr0 bridged with the home network (which is 192.168.1.xxx). The VMs are on the vmbr1 only and I use pfsense virtual FW with two interfaces and do all the routing etc.

I wonder if by removing vmbr1 interface makes any difference.

"systemctl restart networking" is not a fix, not even a workaround as it is just a matter of time and it will go down again.
Lol we seem to have a very similar setup indeed. I also have a Pfsense with 2 interfaces, also vmbr1 LAN and vmbr0 WAN.

I actually never tried a ping (as I thought the whole pve UI/SSH connectivity is down). Will report results when I am back.
 
Deleting vmbr1 and not powering anything else on other than the host server, did not make a difference. I have attached a copy of the system log here but I may have a better one once I can trace the time it happens.
 

Attachments

OK I think I figure this one out. The server I was using had iDrac turned on and it was turned on the same NIC where the OS was binding the interface (no dedicated port for it). I noticed on my router that the IP address that was supposed to be registered as Proxmox, was instead registered as iDrac....... I then realized I had turned on the Integrated lights Out Management but dedicated the same NIC as the OS.

In short, the IP flipped back and forth between Proxmox and the IDrac, hence the SSH + Web Gui dropped but the ping never did because it used the same IP.

Check to see if your physical interface is fighting with an on-board management or another device for the same IP address.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!