Hey there Proxmox forum,
I have a 4-node cluster that I had just finished repairing after it went split-brain on me a few weeks ago, I thought I had resolved all the issues with the hosts and everything was clustering correctly, however today after rebooting one of the hosts (zeus) it is not able to start the pve-cluster service.
Excerpt from journalctl -b -u pve-cluster:
I did already find a few posts with a similar issue:
and they recommended to try removing the old entries in the config.db, however this had no effect on my system and I still get the same output when trying to start the service.
Here's the output for my config.db for the qemu-server entries:
Not sure what else to try here, this started after doing a normal reboot on the host and I'd really like to get it running again or at least get the VM that is currently on it onto another one of the hosts.
Cheers,
BFG9K
I have a 4-node cluster that I had just finished repairing after it went split-brain on me a few weeks ago, I thought I had resolved all the issues with the hosts and everything was clustering correctly, however today after rebooting one of the hosts (zeus) it is not able to start the pve-cluster service.
Excerpt from journalctl -b -u pve-cluster:
Bash:
Jan 15 18:35:19 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 4.
Jan 15 18:35:20 zeus systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: resolved node name 'zeus' to '192.168.0.10' for default node IP address
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: resolved node name 'zeus' to '192.168.0.10' for default node IP address
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: missing directory inode (inode = 0000000002BE1D43)
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: missing directory inode (inode = 0000000002BE1D43)
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: DB load failed
Jan 15 18:35:20 zeus pmxcfs[2584]: [database] crit: DB load failed
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: exit proxmox configuration filesystem (-1)
Jan 15 18:35:20 zeus pmxcfs[2584]: [main] notice: exit proxmox configuration filesystem (-1)
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jan 15 18:35:20 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jan 15 18:35:20 zeus systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jan 15 18:35:20 zeus systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jan 15 18:35:20 zeus systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
I did already find a few posts with a similar issue:
I didint correctly change proxmox hostname, reboote the host and now i have broken pve-cluster. I will add log journalctl -b -u pve-cluster
Code:
Sep 20 03:21:02 host systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Sep 20 09:10:27 host systemd[1]: Starting The Proxmox VE cluster filesystem...
Sep 20 09:10:27 host pmxcfs[1950]: [database] crit: found entry with duplicate name 'qemu-server' - A:(inode = 0x00000000018DA28B, parent = 0x00000000018DA28A, v./mtime = 0x18DA28B/0x1663617922) vs. B:(inode = 0x00000000018DA438, parent = 0x00000000018D
Sep 20 09:10:27 host pmxcfs[1950]: [database] crit: DB...
- rando
- Replies: 6
- Forum: Proxmox VE: Installation and configuration
here is the output
Code:
root@proxmox:~# journalctl -b -u pve-cluster.service
-- Logs begin at Mon 2021-02-08 11:03:01 UTC, end at Mon 2021-02-08 12:18:02 UTC. --
Feb 08 11:03:06 proxmox systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 08 11:03:06 proxmox pmxcfs[1505]: [database] crit: found entry with duplicate name (inode = 0000000000BDAACC, parent = 0000000000000008, name
Feb 08 11:03:06 proxmox pmxcfs[1505]: [database] crit: found entry with duplicate name (inode = 0000000000BDAACC, parent = 0000000000000008, name
Feb 08 11:03:06 proxmox pmxcfs[1505]: [database] crit: DB load failed
Feb 08 11:03:06...
and they recommended to try removing the old entries in the config.db, however this had no effect on my system and I still get the same output when trying to start the service.
Here's the output for my config.db for the qemu-server entries:
Bash:
root@zeus:/# sqlite3 /var/lib/pve-cluster/config.db
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> select * from tree where name='qemu-server';
14|12|14|0|1709181976|4|qemu-server|
46523172|46013665|46523172|2|1736675766|4|qemu-server|
sqlite>
Not sure what else to try here, this started after doing a normal reboot on the host and I'd really like to get it running again or at least get the VM that is currently on it onto another one of the hosts.
Cheers,
BFG9K