Proxmox 9 HA Issue: Node stuck in "old timestamp- dead" after apt dist-upgrade

dmembibre

Active Member
Sep 23, 2020
22
3
43
38
Hello,

Yesterday, one of the nodes in our cluster got stuck in the "old timestamp- dead" status. This occurred immediately after I performed an apt dist-upgrade, which included updates to the following packages:

1759919264687.png

1759919331190.png

The only solution I found to bring the node back online was to perform a physical reboot, which caused significant inconvenience, as the node was running close to 70 virtual machines that had to be evacuated.

1759919791899.png

I suspect this might be a bug in the new Proxmox 9 packages.

I performed a test on another node today:

  1. I put the node into maintenance mode.
  2. I ran the package update (apt dist-upgrade).
  3. Upon exiting maintenance mode, the node immediately reverted to the "old timestamp- dead" status.
This reproducible issue is causing major stability concerns with HA (High Availability). Has anyone else experienced this behavior after updating to Proxmox 9?

Any insight or assistance would be greatly appreciated. Thank you.

Code:
veversion -v
proxmox-ve: 9.0.0 (running kernel: 6.14.11-2-pve)
pve-manager: 9.0.10 (running version: 9.0.10/deb1ca707ec72a89)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-3-pve-signed: 6.14.11-3
proxmox-kernel-6.14: 6.14.11-3
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14.11-1-pve-signed: 6.14.11-1
amd64-microcode: 3.20250311.1
ceph: 19.2.3-pve2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
dnsmasq: 2.91-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
openvswitch-switch: 3.5.0-1+b1
proxmox-backup-client: 4.0.15-1
proxmox-backup-file-restore: 4.0.15-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.2
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.2
proxmox-widget-toolkit: 5.0.6
pve-cluster: 9.0.6
pve-container: 6.0.13
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-1
pve-ha-manager: 5.0.4
pve-i18n: 3.6.0
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
pve-zsync: 2.4.0
qemu-server: 9.0.22
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
 
Hi!

I haven't tried to reproduce it yet: When was the update performed? Has the watchdog been inactive before already (some recent entries in journalctl -u watchdog-mux)?