Cluster not ready

fasi74

Member
Jun 24, 2021
27
2
8
50
We recently had a power surge in our data center running pve servers running in a cluster environment, now all the servers have started up except one of the servers, it has some boot problems with it, but the problem we got was when we tried to start VM's and containers in the server that already boot up, we got error "cluster not ready" and they didn't start up until we fixed the server which failed to boot.

Why did this happen ? what to do when this happens again in the future i mean how to prepare for it?

Kind regards
Faisal Gillani
 
Hello,

Why did this happen ? what to do when this happens again in the future i mean how to prepare for it?
most likely, your network is congested / too slow to pass Corosync messages in time to the other nodes. You should see these retransmits in the Syslog. On the other hand, we recommend using a separate network for the Corosync or adding a second ring_X to the Corosync config [0], to avoid such this issues in a cluster.

[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_adding_redundant_links_to_an_existing_cluster
 
If the cluster has Ceph running, yes it slow, compared to modern storage networks that often run at 10 Gbps or faster. Even if the cluster doesn't have Ceph storage, the issue might be to another network storage such (NFS, CIFS etc,,)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!