Host failing to start pve-cluster service after reboot

bfg9k

New Member
Dec 20, 2024
1
0
1
Hey there Proxmox forum,

I have a 4-node cluster that I had just finished repairing after it went split-brain on me a few weeks ago, I thought I had resolved all the issues with the hosts and everything was clustering correctly, however today after rebooting one of the hosts (zeus) it is not able to start the pve-cluster service.

Excerpt from journalctl -b -u pve-cluster:

Bash:
Jan 15 18:35:19 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 4.
Jan 15 18:35:20 zeus systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: resolved node name 'zeus' to '192.168.0.10' for default node IP address
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: resolved node name 'zeus' to '192.168.0.10' for default node IP address
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: missing directory inode (inode = 0000000002BE1D43)
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: missing directory inode (inode = 0000000002BE1D43)
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: DB load failed
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: DB load failed
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: exit proxmox configuration filesystem (-1)
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: exit proxmox configuration filesystem (-1)
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jan 15 18:35:20 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jan 15 18:35:20 zeus systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jan 15 18:35:20 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.

I did already find a few posts with a similar issue:


and they recommended to try removing the old entries in the config.db, however this had no effect on my system and I still get the same output when trying to start the service.

Here's the output for my config.db for the qemu-server entries:
Bash:
root@zeus:/# sqlite3 /var/lib/pve-cluster/config.db
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> select * from tree where name='qemu-server';
14|12|14|0|1709181976|4|qemu-server|
46523172|46013665|46523172|2|1736675766|4|qemu-server|
sqlite>

Not sure what else to try here, this started after doing a normal reboot on the host and I'd really like to get it running again or at least get the VM that is currently on it onto another one of the hosts.

Cheers,
BFG9K
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!