Fenced node emails, hundreds of them

Mar 27, 2021
102
15
23
43
We had an issue on one of our nodes today, it rebooted during VM migration (local-zfs volume)... this happens for the second time now.

Since then I am getting hundreds emails like the following

I see no HA issues, rebooted the host once again, not sure where to start... HELP
 
Last edited:
This appears to happen approximately every hour.

1636906428117.png

pve12 appears to think it's the quorum master, but this is not correct.

1636906547587.png

found 3000 messages in the postfix queue of pve12.
also looks like the email server is throttling us hence may explain the "every hour phenomenon".
purged the queue and will continue monitoring.
 
Last edited:
please get the following from each node

  • pveversion -v
  • pvecm status
  • ha-manager status
 

Attachments

that looks okay. what about the logs around the time mentioned in the mails (from pve11 and the node that was CRM master at the time?)
 
could you also post your HA config (/etc/pve/ha/resources.cfg and /etc/pve/datacenter.cfg)? the logs do look like there is a bug or edge case not handled well..
 
thanks - sorry to ask once more, but I forgot the /etc/pve/ha/groups.cfg file
 
all except pve31 which is in 3rd DC and is only used for maintaining quorum.
cfg file uploaded in the same location.

I am happy to stop this here and would like to thank for the investigation.

We are currently having performance problems with ceph (details here) and running most VMs on local (zfs mirror) disk.
We have experience node reboots already twice when migrating VMs from local disk.
In the event the reboot was in the middle of the migration, the VM went into failed HA state in which I was not able to start it.
Had to remove the VM from HA which allowed me to start it.
The node reboot is the most weird phenomena in this story, again it happened twice already.

Thanks!
 
thansk! I'll double check to see whether there is some way to trigger this behaviour in our HA state machine, the configs look sane.
 
  • Like
Reactions: hepo

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!