node left the cluster after upgrading it to debian-13

Mar 18, 2024
39
4
8
east of muc
good morning,

we are running a subscribed proxmox cluster. all proxmox installations are on of a self installed debian-12 machines.

following https://linuxconfig.org/how-to-upgrade-debian-to-latest-version i have upgraded one node to debian-13 and rebooted the machine.

the machine booted without any warning but it has kind of left the cluster.

`service corosync status` on the upgraded machine

prints

corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-10-14 14:09:15 CEST; 2s ago
Invocation: adc013e80fa346cb8148f7677eb66ced
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 11412 (corosync)
Tasks: 9 (limit: 309106)
Memory: 156.2M (peak: 156.3M)
CPU: 108ms
CGroup: /system.slice/corosync.service
`-11412 /usr/sbin/corosync -f

Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 has no active links
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 has no active links
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 has no active links
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] link: Resetting MTU for link 0 because host 7 joined
Oct 14 14:09:15 ms-pm07 corosync[11412]: [QUORUM] Members[1]: 7
Oct 14 14:09:15 ms-pm07 corosync[11412]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 14 14:09:15 ms-pm07 systemd[1]: Started corosync.service - Corosync Cluster Engine.

on the other machines in the cluster `service corosync status` complains

Oct 14 14:00:00 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:02 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:04 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:05 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:07 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:09 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:10 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:12 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:13 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:15 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405

is there anything i can do to convince the upgraded node to (re)join the cluster?

thanks for any hints
 
i'm afraid, i have accidental upgraded to pve9.

i have followed https://linuxconfig.org/how-to-upgrade-debian-to-latest-version and a file named
pve-enterprise.sources:

Types: deb
URIs: https://enterprise.proxmox.com/debian/pve
Suites: trixie
Components: pve-enterprise
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg

was placed in /etc/apt/sources.list.d. as this is a new server which was working fine with the bookworm installation and which does not yet have a subscription, i created a file proxmox.sources:

Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg

in /etc/apt/sources.list.d and commented out everything in pve-enterprise.sources

apt update && apt upgrade && apt full-upgrade than resulted in the broken cluster.
 
yes, i saw error messages:

Processing triggers for pve-manager (8.4.14) ...
user config - ignore invalid privilege 'VM.Monitor'
Job for pvedaemon.service failed.
See "systemctl status pvedaemon.service" and "journalctl -xeu pvedaemon.service" for details.
Job for pvestatd.service failed.
See "systemctl status pvestatd.service" and "journalctl -xeu pvestatd.service" for details.
Job for pveproxy.service failed.
See "systemctl status pveproxy.service" and "journalctl -xeu pveproxy.service" for details.
Job for pvescheduler.service failed.
See "systemctl status pvescheduler.service" and "journalctl -xeu pvescheduler.service" for details.
Processing triggers for man-db (2.11.2-2) ...
Processing triggers for ca-certificates (20250419) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Processing triggers for dictionaries-common (1.30.10) ...
ispell-autobuildhash: Processing 'american' dict.
ispell-autobuildhash: Processing 'british' dict.
Processing triggers for pve-ha-manager (5.0.5) ...

and the output of

systemctl status pvestatd.service
was:

* pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
Active: active (running) since Mon 2025-10-13 13:44:11 CEST; 23h ago
Process: 507976 ExecReload=/usr/bin/pvestatd restart (code=exited, status=255/EXCEPTION)
Main PID: 1880 (pvestatd)
Tasks: 1 (limit: 309260)
Memory: 151.0M
CPU: 15min 55.095s
CGroup: /system.slice/pvestatd.service
`-1880 pvestatd

Oct 14 12:47:37 ms-pm07 systemd[1]: Reloading pvestatd.service - PVE Status Daemon...
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: unknown file 'ha/rules.cfg' at /usr/share/perl5/PVE/Cluster.pm line 524.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: Compilation failed in require at /usr/share/perl5/PVE/QemuServer.pm line 36.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: BEGIN failed--compilation aborted at /usr/share/perl5/PVE/QemuServer.pm line 36.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: Compilation failed in require at /usr/share/perl5/PVE/Service/pvestatd.pm line 21.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: BEGIN failed--compilation aborted at /usr/share/perl5/PVE/Service/pvestatd.pm line 21.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: Compilation failed in require at /usr/bin/pvestatd line 9.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: BEGIN failed--compilation aborted at /usr/bin/pvestatd line 9.
Oct 14 12:47:37 ms-pm07 systemd[1]: pvestatd.service: Control process exited, code=exited, status=255/EXCEPTION
Oct 14 12:47:37 ms-pm07 systemd[1]: Reload failed for pvestatd.service - PVE Status Daemon.

i found another message:

Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Processing triggers for libc-bin (2.41-12) ...
Processing triggers for pve-manager (9.0.11) ...
user config - ignore invalid privilege 'VM.Monitor'
got timeout when trying to ensure cluster certificates and base file hierarchy is set up - no quorum (yet) or hung pmxcfs?