[SOLVED] Node reboot on update

ctech

Member
Apr 20, 2021
4
0
6
42
Hi.

I have a cluster of ~10 Proxmox nodes running version 6.3-6.
Yesterday I decided to deploy the latest updates (pve-no-subscription) on all nodes and ran "apt full-upgrade" via an Ansible play.
In the result, most of my nodes have rebooted! Only two of them did not. I have looked through the log files but wasn't able to find anything suspicious.
Could anyone please help me to find out what's happened?

Here's the list of the packages I updated:

Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confo ld dist-upgrade Upgrade: proxmox-widget-toolkit:amd64 (2.4-9, 2.5-1), corosync:amd64 (3.1.0-pve1, 3.1.2-pve1), libcm ap4:amd64 (3.1.0-pve1, 3.1.2-pve1), libpve-storage-perl:amd64 (6.3-8, 6.3-9), libquorum5:amd64 (3.1. 0-pve1, 3.1.2-pve1), proxmox-backup-client:amd64 (1.0.13-1, 1.1.1-1), libvotequorum8:amd64 (3.1.0-pv e1, 3.1.2-pve1), libcfg7:amd64 (3.1.0-pve1, 3.1.2-pve1), libcpg4:amd64 (3.1.0-pve1, 3.1.2-pve1), lib corosync-common4:amd64 (3.1.0-pve1, 3.1.2-pve1)

Syslog excerp in the attachment.
 

Attachments

Hi,

Are you using HA on the cluster? And did you run the updates simultaneously or one after the other? The "Package Updates" section [1] in the High Availability section of the docs has some notes on why this could cause issues. In addition, due to fencing [2], when individual nodes lose quorom, they can be forced to shut down, in order to protect the rest of the cluster from damage.

[1] https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_package_updates
[2] https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_fencing
 
Yes I'm using HA, and yes, I did it simultaneously. Now I see why it wasn't the best idea. Didn't read the manual, classic! Thank you for the explanation.