PVE cluster nodes frequently go offline

liyk

New Member
Aug 28, 2024
2
0
1
The cluster only has 2 nodes, and recently there has been a frequent occurrence of one offline node. After restarting the PvE Cluster and Colosync services, it will briefly recover, but after a while, it will continue offline.
1724814446674.jpeg

1724814471379.jpeg
 
It would be helpful to have more information what happens during those down times. As I can see from the screenshots, the first one is from the second node P-proxmox2 and the second one is from the first node P-proxmox1. Can you ping the second node from the first node and vice versa without any packet losses? Could you post the output of journalctl -u pve-cluster -u pvestatd when that happens?

FYI, it is not a good idea to have a cluster with only two nodes, as they can loose quorum very easily by just losing the other node. You should set up a Qdevice in case you're not planning to expand your cluster with a third node anytime soon. See here [1] for more information.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
 
IMO - some things to look at re your cluster communications :
- How fast are your physical network interfaces ( 100-Meg , 1-Gig , 10-Gig , something faster ).
There is a possibility your interfaces might be busy moving I/O traffic to/from your VMs , and you don't have the network additional I/O capacity for the cluster to communicate to the other cluster(s).
- Are you performing backups when the cluster(s) drops ?
- Do you have interface errors and/or packet drops on your physical & virtual ethernet interfaces ( and your external switch(es) ).
- What is your current I/O bandwidth rate when you drop a node in your cluster ?

I have a Proxmox network with14-Clusters and 6-external NFS systems for my VM hard disk storage. I have 10 & 40 Gig network cards. My cluster IPs and my NFS IPs do not share any IP address space with my VMs --- My VMs & my Cluster IPs & my NFS IPs are on unique IP networks. I do this to keep unwanted/un-needed network chatter down to a minimum. I have never had a node drop out of the cluster , even with pushing 20+ Gig on multiple nodes in the cluster at the same time while all nodes are doing a backup at the same time.

North Idaho Tom Jones
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!