Proxmox backup job failing on cluster errors?

Sep 13, 2022
109
31
33
Hi,

I noticed in a cluster where 3/12 nodes were offline, the backup job failed with
Code:
ERROR: Backup of VM 100 failed - unable to open file '/etc/pve/nodes/dehnf-test-n1-2/lxc/100.conf.tmp.3715510' - Permission denied
where of course the system is not fully operable, miss-configured, bad and/or broken, and should be fixed, but I think backups should possibly be as robust as possible and working nevertheless, so I wonder if this is expected and/or if I can improve on that (here the VMs were working and no one looked after anything, which of course is not correct either).
(After executing systemctl restart pve-cluster.service , backup worked again).

Code:
Votequorum information
----------------------
Expected votes:   12
Highest expected: 12
Total votes:      9
Quorum:           7
Flags:            Quorate
 
that error indicates that that node was not part of the quorate partition at the time of the backup run.. check the logs (in particular of pve-cluster and corosync services), they should give more detail.
 
  • Like
Reactions: sdettmer
Thank you for your quick reply!
that error indicates that that node was not part of the quorate partition at the time of the backup run.. check the logs (in particular of pve-cluster and corosync services), they should give more detail.
Yes, this surely could be the case, but shouldn't the backup be working anyway, and be it just to be able to manually migrate the VM to another PVE?
 
almost all tasks in PVE require the node to be part of the quorate partition. in disaster recovery scenarios, you can always force quorum to be able to backup, for example - but that requires a human in the loop making the decision that this is safe ;)