Latest Proxmox 8.x: WebUI Port 8006 not reachable in a 3 node cluster

dan.ger

Well-Known Member
May 13, 2019
89
7
48
Hello,

we have a problem in our 3 node production cluster which is based on Proxmox 8.2 latest kernel (no-subscription). The Cluster hat 768GB Ram, 300 core and 24TB Space. The cluster is not overcommited and has enough resourves. The web urls on port 8006 are available for a time. Then one by one fails and is not reachable. I believe that this features comes with kernel update on April 2025.

The certificates of the cluster are created by LetsEncrypt and are renewed continuously. After restart one node, the web ui is reachable on the restarted node. for non deterministic time. Then it went offline. SSH Ports are not reachable if the web ui went down. So we have to restart per IPMI. But all VMs are running without any issues. No jobs are executed if the web UI went down on a node.

So any suggestions where to start investigating or is this a known bug?
 
It seems to be the service fail2ban, cause the system is migrated from Proxmox 6 over the years. And the new backend is systemd, no more logs!
 
What it shown on the ipmi display output when the machine freezes, if anything?
IPMI is responsible and I can log in... so service ssh status leads me to fail2ban and there is a wrong backend configured!
 
It seems it is the kerne 6.12, with new Kernel 6.14 the problems are no longer existing.