PVE cluster nodes frequently go offline

liyk · Aug 28, 2024

The cluster only has 2 nodes, and recently there has been a frequent occurrence of one offline node. After restarting the PvE Cluster and Colosync services, it will briefly recover, but after a while, it will continue offline.

dakralex · Sep 4, 2024

It would be helpful to have more information what happens during those down times. As I can see from the screenshots, the first one is from the second node P-proxmox2 and the second one is from the first node P-proxmox1. Can you ping the second node from the first node and vice versa without any packet losses? Could you post the output of journalctl -u pve-cluster -u pvestatd when that happens?

FYI, it is not a good idea to have a cluster with only two nodes, as they can loose quorum very easily by just losing the other node. You should set up a Qdevice in case you're not planning to expand your cluster with a third node anytime soon. See here [1] for more information.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support

NorthIdahoTomJones · Sep 4, 2024

IMO - some things to look at re your cluster communications :
- How fast are your physical network interfaces ( 100-Meg , 1-Gig , 10-Gig , something faster ).
There is a possibility your interfaces might be busy moving I/O traffic to/from your VMs , and you don't have the network additional I/O capacity for the cluster to communicate to the other cluster(s).
- Are you performing backups when the cluster(s) drops ?
- Do you have interface errors and/or packet drops on your physical & virtual ethernet interfaces ( and your external switch(es) ).
- What is your current I/O bandwidth rate when you drop a node in your cluster ?

I have a Proxmox network with14-Clusters and 6-external NFS systems for my VM hard disk storage. I have 10 & 40 Gig network cards. My cluster IPs and my NFS IPs do not share any IP address space with my VMs --- My VMs & my Cluster IPs & my NFS IPs are on unique IP networks. I do this to keep unwanted/un-needed network chatter down to a minimum. I have never had a node drop out of the cluster , even with pushing 20+ Gig on multiple nodes in the cluster at the same time while all nodes are doing a backup at the same time.

North Idaho Tom Jones

Search

Search

PVE cluster nodes frequently go offline

liyk

New Member

dakralex

Proxmox Staff Member

NorthIdahoTomJones

New Member

We value your privacy