PVE7 - corosync upgrade rebooted all nodes !

TwiX

Active Member
Feb 3, 2015
284
21
38
Hi,

I tried to upgrade a 6 nodes pve7 cluster yesterday.

We use Ceph and HA for all VMs.

I was able to upgrade 4 nodes without issue.
But on the fifth node, I lost the whole cluster. All nodes rebooted !

proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.5-pve1
ceph-fuse: 16.2.5-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1


Syslogs for nodes 22 to 27 :
update ok for nodes 22,23,24 and 27
upgrade of node 25 crashed the entire cluster at 17:09
upgrade for node 26 is still pending due to the crash of node 25



Thanks in advance !

Antoine
 

Attachments

  • syslog_node27.txt
    9.3 KB · Views: 2
  • syslog_node26.txt
    13.9 KB · Views: 1
  • syslog_node25.txt
    54.1 KB · Views: 5
  • syslog_node24.txt
    8.8 KB · Views: 1
  • syslog_node23.txt
    14.1 KB · Views: 2
  • syslog_node22.txt
    11.3 KB · Views: 1

TwiX

Active Member
Feb 3, 2015
284
21
38
don't leave me in the dark :p

Maybe upgrades should be done with lrm (pve-ha-lrm) & crm (pve-ha-crm) services stopped in order to prevent such reboots ?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!