Host failing to start pve-cluster service after reboot

bfg9k · 2025-01-15T08:55:57+0100

Hey there Proxmox forum,

I have a 4-node cluster that I had just finished repairing after it went split-brain on me a few weeks ago, I thought I had resolved all the issues with the hosts and everything was clustering correctly, however today after rebooting one of the hosts (zeus) it is not able to start the pve-cluster service.

Excerpt from journalctl -b -u pve-cluster:

Bash:

Jan 15 18:35:19 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 4.
Jan 15 18:35:20 zeus systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: resolved node name 'zeus' to '192.168.0.10' for default node IP address
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: resolved node name 'zeus' to '192.168.0.10' for default node IP address
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: missing directory inode (inode = 0000000002BE1D43)
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: missing directory inode (inode = 0000000002BE1D43)
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: DB load failed
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: DB load failed
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: exit proxmox configuration filesystem (-1)
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: exit proxmox configuration filesystem (-1)
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jan 15 18:35:20 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jan 15 18:35:20 zeus systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jan 15 18:35:20 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.

I did already find a few posts with a similar issue:

R

Thread 'Need help to repair my pve-cluster/config.db'

Sep 20, 2022

I didint correctly change proxmox hostname, reboote the host and now i have broken pve-cluster. I will add log journalctl -b -u pve-cluster

Code:

Sep 20 03:21:02 host systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Sep 20 09:10:27 host systemd[1]: Starting The Proxmox VE cluster filesystem...
Sep 20 09:10:27 host pmxcfs[1950]: [database] crit: found entry with duplicate name 'qemu-server' - A:(inode = 0x00000000018DA28B, parent = 0x00000000018DA28A, v./mtime = 0x18DA28B/0x1663617922) vs. B:(inode = 0x00000000018DA438, parent = 0x00000000018D
Sep 20 09:10:27 host pmxcfs[1950]: [database] crit: DB...

M

Post in thread 'pveproxy - /dev/fuse not mounted to /etc/pve'

Feb 8, 2021

here is the output

Code:

root@proxmox:~# journalctl -b -u pve-cluster.service
-- Logs begin at Mon 2021-02-08 11:03:01 UTC, end at Mon 2021-02-08 12:18:02 UTC. --
Feb 08 11:03:06 proxmox systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 08 11:03:06 proxmox pmxcfs[1505]: [database] crit: found entry with duplicate name (inode = 0000000000BDAACC, parent = 0000000000000008, name
Feb 08 11:03:06 proxmox pmxcfs[1505]: [database] crit: found entry with duplicate name (inode = 0000000000BDAACC, parent = 0000000000000008, name
Feb 08 11:03:06 proxmox pmxcfs[1505]: [database] crit: DB load failed
Feb 08 11:03:06...

and they recommended to try removing the old entries in the config.db, however this had no effect on my system and I still get the same output when trying to start the service.

Here's the output for my config.db for the qemu-server entries:

Bash:

root@zeus:/# sqlite3 /var/lib/pve-cluster/config.db
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> select * from tree where name='qemu-server';
14|12|14|0|1709181976|4|qemu-server|
46523172|46013665|46523172|2|1736675766|4|qemu-server|
sqlite>

Not sure what else to try here, this started after doing a normal reboot on the host and I'd really like to get it running again or at least get the VM that is currently on it onto another one of the hosts.

Cheers,
BFG9K

Search

Search

Host failing to start pve-cluster service after reboot

bfg9k

New Member

Thread 'Need help to repair my pve-cluster/config.db'

Post in thread 'pveproxy - /dev/fuse not mounted to /etc/pve'