Proxmox web interface offline repeatedly

Oct 25, 2020
35
3
13
43
Since a few days I experience strange outages of the proxmox web interface. This happens every few minutes for a short amount of time and then the web interface is back online again without any manual action.
The strange thing I noticed is, that during this outage, the proxmox server is pingable - as soon as the webinterface is back online the icmp package is rejected.
I'm using the latest updates on the proxmox kernel:

Kernel Version Linux 5.4.78-1-pve #1 SMP PVE 5.4.78-1 (Mon, 30 Nov 2020 10:57:47 +0100)
PVE Manager Version pve-manager/6.3-2/22f57405

All containers and vms remain responsive during this outage so it seems more like a service issue instead of a network issue.
Could it be a firewall issue - and if yes - how can I fix this?
 
hi,

is the issue still going on?

have you checked journal and syslog to see if anything errors during the outage?
 
I still face this issues but a lot less often - but this is probably because of these settings:
rx-checksumming: off
tx-checksumming: off
generic-segmentation-offload: off
generic-receive-offload: off
tcp-segmentation-offload: off

Still today I had an outage and the only message I found in the syslog was the following:
Code:
Dec 14 17:07:00 pve systemd[1]: Started Proxmox VE replication runner.
Dec 14 17:07:21 pve pvestatd[2560]: pbs: error fetching datastores - 500 500 Can't connect to pbs.domain.local:8007 (Temporary failure in name resolution)
Dec 14 17:07:21 pve pvestatd[2560]: status update time (20.300 seconds)
Dec 14 17:07:31 pve pvestatd[2560]: pbs: error fetching datastores - 500 500 Can't connect to pbs.domain.local:8007 (Temporary failure in name resolution)
Dec 14 17:07:31 pve pvestatd[2560]: status update time (10.315 seconds)
Dec 14 17:08:00 pve systemd[1]: Starting Proxmox VE replication runner...
Dec 14 17:08:00 pve systemd[1]: pvesr.service: Succeeded.

(I changed the domain name)

The Pihole server responsible for domain resolution is one of the containers running on the host.
During the outages of the webinterface all other containers and vms act completely normal - no network issues or at least non I recognize.
 
I found the source of my web interface problems - my firewall settings we're wrong - but the interesting thing is, it still worked most of the time.
So is the firewall in some sort of on/off switching state?
 
So is the firewall in some sort of on/off switching state?
not normally no.

my firewall settings we're wrong
what was wrong?

Still today I had an outage and the only message I found in the syslog was the following:
the pbs error is a resolution issue like it says there. check your /etc/pve/storage.cfg , /etc/hosts, /etc/resolv.conf and so on. it should resolve correctly (but i doubt this is the cause of the webinterface problems)
 
I set the firewall for the host to be on for all networks an allowed the 8006 TCP port for one of my Vlans but not for the main network.
1608749386237.png
The red one was missing but still I was able to access the web ui most of the time, which makes me think that the firewall my be in some kind of flapping mode.
The resolve issue seems to be gone too.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!