Intermitent cluster node failure

alitvak69

Renowned Member
Oct 2, 2015
105
3
83
Hello everyone,

On my production pve-3.4-11 cluster (qdisk + 2 nodes) I am having a node evicting at the middle of the night once in a while.

The only clues in the other node logs are:

corosync.log
Code:
Dec 19 01:02:28 corosync [TOTEM ] A processor failed, forming new configuration.
Dec 19 01:02:30 corosync [CLM  ] CLM CONFIGURATION CHANGE

qdiskd.log
Code:
Dec 19 01:01:19 qdiskd Node 1 missed an update (2/10)
Dec 19 01:01:20 qdiskd Node 1 missed an update (3/10)
Dec 19 01:01:21 qdiskd Node 1 missed an update (4/10)
Dec 19 01:01:22 qdiskd Node 1 missed an update (5/10)
Dec 19 01:01:23 qdiskd Node 1 missed an update (6/10)
Dec 19 01:01:24 qdiskd Node 1 missed an update (7/10)
Dec 19 01:01:25 qdiskd Node 1 missed an update (8/10)
Dec 19 01:01:26 qdiskd Node 1 missed an update (9/10)
Dec 19 01:01:27 qdiskd Node 1 missed an update (10/10)
Dec 19 01:01:28 qdiskd Node 1 missed an update (11/10)
Dec 19 01:01:28 qdiskd Node 1 DOWN
Dec 19 01:01:28 qdiskd Making bid for master
Dec 19 01:01:29 qdiskd Node 1 missed an update (12/10)
Dec 19 01:01:30 qdiskd Node 1 missed an update (13/10)
Dec 19 01:01:31 qdiskd Node 1 missed an update (14/10)
Dec 19 01:01:32 qdiskd Node 1 missed an update (15/10)
Dec 19 01:01:32 qdiskd Assuming master role
Dec 19 01:01:33 qdiskd Node 1 is undead.
Dec 19 01:01:33 qdiskd Writing eviction notice (again) for node 1
Dec 19 01:01:34 qdiskd Node 1 evicted

When it happens VMs do migrate but needless to say that is not good situation for us.
How can I debug the problem better?

One of the thoughts I had that may be node misses a heartbeat during the backup, does it make sense to adjust some settings (which ones?) on cluster so fencing is not initiated?

Thank you in advance,
 
Maybe there is too much load on the qdisk storage (missed update). I would try to reduce backup traffic (rate limit), or do not run backups from several nodes in parallel.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!