[SOLVED] node left the cluster after upgrading it to debian-13

gustav · Oct 14, 2025

good morning,

we are running a subscribed proxmox cluster. all proxmox installations are on of a self installed debian-12 machines.

following https://linuxconfig.org/how-to-upgrade-debian-to-latest-version i have upgraded one node to debian-13 and rebooted the machine.

the machine booted without any warning but it has kind of left the cluster.

`service corosync status` on the upgraded machine

prints

corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-10-14 14:09:15 CEST; 2s ago
Invocation: adc013e80fa346cb8148f7677eb66ced
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 11412 (corosync)
Tasks: 9 (limit: 309106)
Memory: 156.2M (peak: 156.3M)
CPU: 108ms
CGroup: /system.slice/corosync.service
`-11412 /usr/sbin/corosync -f

Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 has no active links
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 has no active links
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] host: host: 6 has no active links
Oct 14 14:09:15 ms-pm07 corosync[11412]: [KNET ] link: Resetting MTU for link 0 because host 7 joined
Oct 14 14:09:15 ms-pm07 corosync[11412]: [QUORUM] Members[1]: 7
Oct 14 14:09:15 ms-pm07 corosync[11412]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 14 14:09:15 ms-pm07 systemd[1]: Started corosync.service - Corosync Cluster Engine.

on the other machines in the cluster `service corosync status` complains

Oct 14 14:00:00 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:02 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:04 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:05 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:07 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:09 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:10 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:12 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:13 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405
Oct 14 14:00:15 ms-pm01 corosync[1446]: [KNET ] rx: Packet rejected from 192.168.30.39:5405

is there anything i can do to convince the upgraded node to (re)join the cluster?

thanks for any hints

BobhWasatch · Oct 14, 2025

To be clear, did you upgrade the Proxmox packages to PVE 9 or did you only upgrade the Debian packages? If the latter, you need to do the former as well.

gustav · Oct 14, 2025

i'm afraid, i have accidental upgraded to pve9.

i have followed https://linuxconfig.org/how-to-upgrade-debian-to-latest-version and a file named
pve-enterprise.sources:

Types: deb
URIs: https://enterprise.proxmox.com/debian/pve
Suites: trixie
Components: pve-enterprise
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg

was placed in /etc/apt/sources.list.d. as this is a new server which was working fine with the bookworm installation and which does not yet have a subscription, i created a file proxmox.sources:

Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg

in /etc/apt/sources.list.d and commented out everything in pve-enterprise.sources

apt update && apt upgrade && apt full-upgrade than resulted in the broken cluster.

BobhWasatch · Oct 14, 2025

The official method for upgrading is here: https://pve.proxmox.com/wiki/Upgrade_from_8_to_9

Perhaps reviewing it will reveal a step that was overlooked. Did you see any error messages during the upgrade process?

gustav · Oct 14, 2025

yes, i saw error messages:

Processing triggers for pve-manager (8.4.14) ...
user config - ignore invalid privilege 'VM.Monitor'
Job for pvedaemon.service failed.
See "systemctl status pvedaemon.service" and "journalctl -xeu pvedaemon.service" for details.
Job for pvestatd.service failed.
See "systemctl status pvestatd.service" and "journalctl -xeu pvestatd.service" for details.
Job for pveproxy.service failed.
See "systemctl status pveproxy.service" and "journalctl -xeu pveproxy.service" for details.
Job for pvescheduler.service failed.
See "systemctl status pvescheduler.service" and "journalctl -xeu pvescheduler.service" for details.
Processing triggers for man-db (2.11.2-2) ...
Processing triggers for ca-certificates (20250419) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Processing triggers for dictionaries-common (1.30.10) ...
ispell-autobuildhash: Processing 'american' dict.
ispell-autobuildhash: Processing 'british' dict.
Processing triggers for pve-ha-manager (5.0.5) ...

and the output of

systemctl status pvestatd.service
was:

* pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
Active: active (running) since Mon 2025-10-13 13:44:11 CEST; 23h ago
Process: 507976 ExecReload=/usr/bin/pvestatd restart (code=exited, status=255/EXCEPTION)
Main PID: 1880 (pvestatd)
Tasks: 1 (limit: 309260)
Memory: 151.0M
CPU: 15min 55.095s
CGroup: /system.slice/pvestatd.service
`-1880 pvestatd

Oct 14 12:47:37 ms-pm07 systemd[1]: Reloading pvestatd.service - PVE Status Daemon...
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: unknown file 'ha/rules.cfg' at /usr/share/perl5/PVE/Cluster.pm line 524.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: Compilation failed in require at /usr/share/perl5/PVE/QemuServer.pm line 36.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: BEGIN failed--compilation aborted at /usr/share/perl5/PVE/QemuServer.pm line 36.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: Compilation failed in require at /usr/share/perl5/PVE/Service/pvestatd.pm line 21.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: BEGIN failed--compilation aborted at /usr/share/perl5/PVE/Service/pvestatd.pm line 21.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: Compilation failed in require at /usr/bin/pvestatd line 9.
Oct 14 12:47:37 ms-pm07 pvestatd[507976]: BEGIN failed--compilation aborted at /usr/bin/pvestatd line 9.
Oct 14 12:47:37 ms-pm07 systemd[1]: pvestatd.service: Control process exited, code=exited, status=255/EXCEPTION
Oct 14 12:47:37 ms-pm07 systemd[1]: Reload failed for pvestatd.service - PVE Status Daemon.

i found another message:

Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Processing triggers for libc-bin (2.41-12) ...
Processing triggers for pve-manager (9.0.11) ...
user config - ignore invalid privilege 'VM.Monitor'
got timeout when trying to ensure cluster certificates and base file hierarchy is set up - no quorum (yet) or hung pmxcfs?

leesteken · Oct 14, 2025

The error messages look a lot like this thread which suggest that you are using bookworm (Debian 12) repositories (or no repositories) for Debian 13. Make sure you have to correct Debian base repositories also: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_debian_base_repositories

gustav · Oct 15, 2025

T

leesteken said:
The error messages look a lot like this thread which suggest that you are using bookworm (Debian 12) repositories (or no repositories) for Debian 13. Make sure you have to correct Debian base repositories also: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_debian_base_repositories

sad to state that the machine is using debian 13 repositories and apt update states that all packages are up to date:

apt update
Hit:1 http://security.debian.org/debian-security trixie-security InRelease
Hit:2 http://download.proxmox.com/debian/pve trixie InRelease
Hit:3 http://ftp.gwdg.de/debian trixie InRelease
Hit:4 http://ftp.gwdg.de/debian trixie-updates InRelease
All packages are up to date.

meanwhile i am considering a bookworm reinstallation of the questionable host.

gustav · Oct 15, 2025

things are even worse: the existing cluster was influenced by the broken host - nobody was able to login to the proxmox webif and the loggedin persons were unable to connect to their vms. after i had shutdown the questionable host users could login and connect to their vms.

currently the questionable machine has been shutdown and as i do not think that it can be healed, i shall make a fresh bookworm installation in the afternoon and refrain from upgrading to trixie.

fiona · Oct 15, 2025

Hi,
please share the output of

Code:

pveversion -v
grep '' /etc/apt/sources.list.d/* /etc/apt/sources.list.d/
apt dist-upgrade

You don't have to confirm the upgrade yet, just to see what it says.

EDIT: sorry I was thinking about components, but they're not shown in the output, removed my wrong initial message

gustav · Oct 15, 2025

pveversion -v :
proxmox-ve: 9.0.0 (running kernel: 6.14.11-4-pve)
pve-manager: 9.0.11 (running version: 9.0.11/3bf5476b8a4699e2)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.8: 6.8.12-15
proxmox-kernel-6.8.12-15-pve-signed: 6.8.12-15
amd64-microcode: 3.20250311.1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown: residual config
ifupdown2: 3.3.0-1+pmx10
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.16-1
proxmox-backup-file-restore: 4.0.16-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.0
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.2
proxmox-widget-toolkit: 5.0.6
pve-cluster: 9.0.6
pve-container: 6.0.13
pve-docs: 9.0.8
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-2
pve-ha-manager: 5.0.5
pve-i18n: 3.6.1
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.23
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

grep '' /etc/apt/sources.list.d/* /etc/apt/sources.list.d/ :
/etc/apt/sources.list.d/debian.sources:# Modernized from /etc/apt/sources.list
/etc/apt/sources.list.d/debian.sources:Types: deb deb-src
/etc/apt/sources.list.d/debian.sources:URIs: http://ftp.gwdg.de/debian/
/etc/apt/sources.list.d/debian.sources:Suites: trixie
/etc/apt/sources.list.d/debian.sources:Components: main non-free-firmware
/etc/apt/sources.list.d/debian.sources:Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
/etc/apt/sources.list.d/debian.sources:
/etc/apt/sources.list.d/debian.sources:# Modernized from /etc/apt/sources.list
/etc/apt/sources.list.d/debian.sources:Types: deb deb-src
/etc/apt/sources.list.d/debian.sources:URIs: http://security.debian.org/debian-security/
/etc/apt/sources.list.d/debian.sources:Suites: trixie-security
/etc/apt/sources.list.d/debian.sources:Components: main non-free-firmware
/etc/apt/sources.list.d/debian.sources:Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
/etc/apt/sources.list.d/debian.sources:
/etc/apt/sources.list.d/debian.sources:# Modernized from /etc/apt/sources.list
/etc/apt/sources.list.d/debian.sources:Types: deb deb-src
/etc/apt/sources.list.d/debian.sources:URIs: http://ftp.gwdg.de/debian/
/etc/apt/sources.list.d/debian.sources:Suites: trixie-updates
/etc/apt/sources.list.d/debian.sources:Components: main non-free-firmware
/etc/apt/sources.list.d/debian.sources:Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
/etc/apt/sources.list.d/debian.sources:
/etc/apt/sources.list.d/debian.sources:
/etc/apt/sources.list.d/proxmox.sources:Types: deb
/etc/apt/sources.list.d/proxmox.sources:URIs: http://download.proxmox.com/debian/pve
/etc/apt/sources.list.d/proxmox.sources:Suites: trixie
/etc/apt/sources.list.d/proxmox.sources:Components: pve-no-subscription
/etc/apt/sources.list.d/proxmox.sources:Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
/etc/apt/sources.list.d/proxmox.sources:
/etc/apt/sources.list.d/pve-enterprise.sources:#Types: deb
/etc/apt/sources.list.d/pve-enterprise.sources:#URIs: https://enterprise.proxmox.com/debian/pve
/etc/apt/sources.list.d/pve-enterprise.sources:#Suites: trixie
/etc/apt/sources.list.d/pve-enterprise.sources:#Components: pve-enterprise
/etc/apt/sources.list.d/pve-enterprise.sources:#Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
grep: /etc/apt/sources.list.d/: Is a directory

apt dist-upgrade :
cannot execute because the machine is offline (in order not to influence our existing cluster - see my last post) but i remember that it's last output stated that everything was up to date.

gustav · Oct 15, 2025

@fiona: as announced (Today at 09:27) the questionable machine will be reinstalled later today. so there is not much time left for healing attempts.

fabian · Oct 15, 2025

could you please save /var/log/apt/* before reinstalling, and provide the contents? the symptoms all look like you did some sort of partial upgrade because of a repository misconfiguration.. maybe you also had pending network changes that got activated by the reboot? those would explain why corosync suddenly rejected the node..

gustav · Oct 15, 2025

@fabian: see attachment

If you have reasonable hope that we can still include the node in the cluster, I can postpone the reinstallation.

p.s.: in order to upload i had to rename the uploaded file from .tgz to .tar

fabian · Oct 15, 2025

those look okay to me. what about corosync.conf and the network config? "journalctl -b" from the first boot after the full-upgrade would also be interesting.

gustav · Oct 15, 2025

fabian said:
those look okay to me. what about corosync.conf and the network config? "journalctl -b" from the first boot after the full-upgrade would also be interesting.

is this of academic interest or do you think we could heal the node?

regarding corosync.conf and network config: what exactly do you need?
regarding journalctl -b: i'm afraid that this is lost now.

fabian · Oct 15, 2025

I think it is likely possible to "heal" the node, but if you want to proceed with reinstallation, that's of course also fine!

gustav said:
regarding corosync.conf and network config: what exactly do you need?

the log message by corosync indicates that the other nodes reject the upgraded node's traffic because it's not originating from the expected address. this usually indicates some sort of network setup change that is either wrong or not reflected in corosync.conf.

gustav · Oct 15, 2025

i have removed the node from the cluster and the users guide suggests not to boot the node in the same network. so i'm going to make a fresh install.

Search

Search

[SOLVED] node left the cluster after upgrading it to debian-13

gustav

Member

BobhWasatch

Famous Member

gustav

Member

BobhWasatch

Famous Member

gustav

Member

leesteken

Distinguished Member

gustav

Member

gustav

Member

fiona

Proxmox Staff Member

gustav

Member

gustav

Member

fabian

Proxmox Staff Member

gustav

Member

fabian

Proxmox Staff Member

gustav

Member

fabian

Proxmox Staff Member

gustav

Member

We value your privacy