Hello,
I just updated my cluster to proxmox 9, and most nodes went well, except 2 of them that ended up in a very weird state.
Those 2 nodes hung at "Setting up pve-cluster". And I have noticed that /etc/pve was locked (causing any process that tried to access it to lock in a "D" state)
The only way to finish the upgrade was to reboot in recovery mode.
After the upgrade was finished, all looked good until I rebooted any one of those nodes. After the reboot, they would come up and /etc/pve would be stuck again.
This would cause /etc/pve to become stuck on other nodes in the cluster, causing them to go into a reboot loop.
The only way to recover these node is to boot in recovery mode, do a "apt install --reinstall pve-cluster" and press CTRL+D to continue boot and they come up and wotrk as expected.
But if any of these 2 nodes reboot again, the situation repeats (/etc/pve becomes stuck in all nodes and they enter the reboot loop).
It looks like something did not get upgraded correctly on those 2 nodes, but I can't figure out what.
I must mention that during this time, pvecm status shows a healthy cluster at all times.
Cheers,
Liviu
I just updated my cluster to proxmox 9, and most nodes went well, except 2 of them that ended up in a very weird state.
Those 2 nodes hung at "Setting up pve-cluster". And I have noticed that /etc/pve was locked (causing any process that tried to access it to lock in a "D" state)
The only way to finish the upgrade was to reboot in recovery mode.
After the upgrade was finished, all looked good until I rebooted any one of those nodes. After the reboot, they would come up and /etc/pve would be stuck again.
This would cause /etc/pve to become stuck on other nodes in the cluster, causing them to go into a reboot loop.
The only way to recover these node is to boot in recovery mode, do a "apt install --reinstall pve-cluster" and press CTRL+D to continue boot and they come up and wotrk as expected.
But if any of these 2 nodes reboot again, the situation repeats (/etc/pve becomes stuck in all nodes and they enter the reboot loop).
It looks like something did not get upgraded correctly on those 2 nodes, but I can't figure out what.
I must mention that during this time, pvecm status shows a healthy cluster at all times.
Cheers,
Liviu