Hello,
we have a problem in our 3 node production cluster which is based on Proxmox 8.2 latest kernel (no-subscription). The Cluster hat 768GB Ram, 300 core and 24TB Space. The cluster is not overcommited and has enough resourves. The web urls on port 8006 are available for a time. Then one by one fails and is not reachable. I believe that this features comes with kernel update on April 2025.
The certificates of the cluster are created by LetsEncrypt and are renewed continuously. After restart one node, the web ui is reachable on the restarted node. for non deterministic time. Then it went offline. SSH Ports are not reachable if the web ui went down. So we have to restart per IPMI. But all VMs are running without any issues. No jobs are executed if the web UI went down on a node.
So any suggestions where to start investigating or is this a known bug?
we have a problem in our 3 node production cluster which is based on Proxmox 8.2 latest kernel (no-subscription). The Cluster hat 768GB Ram, 300 core and 24TB Space. The cluster is not overcommited and has enough resourves. The web urls on port 8006 are available for a time. Then one by one fails and is not reachable. I believe that this features comes with kernel update on April 2025.
The certificates of the cluster are created by LetsEncrypt and are renewed continuously. After restart one node, the web ui is reachable on the restarted node. for non deterministic time. Then it went offline. SSH Ports are not reachable if the web ui went down. So we have to restart per IPMI. But all VMs are running without any issues. No jobs are executed if the web UI went down on a node.
So any suggestions where to start investigating or is this a known bug?