No GUI after power loss

butt3rballbeats · Mar 3, 2023

Hey I am a little bit of a noobie when it comes to proxmox/linux. I've been running proxmox for about a month now with no issues until today when I lost power. When booting back up I am able to ping and SSH into my host again but am not able to connect to the GUI. I confirmed in /etc/hosts that everything is still set up correctly and it appears fine to me.

I tried to do some digging and troubleshooting on my own on the forum and think I am stuck. When I run:
systemctl status pve-cluster

Code:

pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2023-03-02 23:31:56 EST; 25min ago
    Process: 2666 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
        CPU: 10ms


Mar 02 23:31:56 one systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Mar 02 23:31:56 one systemd[1]: Stopped The Proxmox VE cluster filesystem.
Mar 02 23:31:56 one systemd[1]: pve-cluster.service: Start request repeated too quickly.
Mar 02 23:31:56 one systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Mar 02 23:31:56 one systemd[1]: Failed to start The Proxmox VE cluster filesystem.

When running
journalctl -u pve-cluster

Code:

Journal file /var/log/journal/76190b52a0dd462a8caf31bc668ea0ac/system.journal is truncated, ignoring file.
-- Journal begins at Thu 2023-03-02 17:15:57 EST, ends at Thu 2023-03-02 23:58:24 EST. --
Mar 02 23:31:54 one systemd[1]: Starting The Proxmox VE cluster filesystem...
Mar 02 23:31:54 one pmxcfs[2643]: [database] crit: unable to set WAL mode: disk I/O error#010
Mar 02 23:31:54 one pmxcfs[2643]: [database] crit: unable to set WAL mode: disk I/O error#010
Mar 02 23:31:54 one pmxcfs[2643]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Mar 02 23:31:54 one pmxcfs[2643]: [main] notice: exit proxmox configuration filesystem (-1)
Mar 02 23:31:54 one pmxcfs[2643]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Mar 02 23:31:54 one pmxcfs[2643]: [main] notice: exit proxmox configuration filesystem (-1)
Mar 02 23:31:54 one systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Mar 02 23:31:54 one systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Mar 02 23:31:54 one systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Mar 02 23:31:55 one systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 1.
Mar 02 23:31:55 one systemd[1]: Stopped The Proxmox VE cluster filesystem.

This is where I am stuck, I am not sure what to do from here as I am still new to all of this. Any help is appreciated, thanks!

mikeinnyc · Mar 3, 2023

The error message unable to set WAL mode: disk I/O error suggests that there may be an issue with the disk or the file system where the Proxmox configuration database is located. The error message memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db' indicates that the database file used by the service could not be opened, which could also be caused by disk or file system issues.

I hope you have battery backup or exciting stuff awaits you.
https://pve.proxmox.com/wiki/Cluster_Manager

Cluster Cold Start

It is obvious that a cluster is not quorate when all nodes are offline. This is a common case after a power failure.

It is always a good idea to use an uninterruptible power supply (“UPS”, also called “battery backup”) to avoid this state, especially if you want HA.

On node startup, the pve-guests service is started and waits for quorum. Once quorate, it starts all guests which have the onboot flag set.
When you turn on nodes, or when power comes back after power failure, it is likely that some nodes will boot faster than others. Please keep in mind that guest startup is delayed until you reach quorum.

butt3rballbeats · Mar 3, 2023

Yeah... it's my fault for not having a UPS in place just yet slowly been building out my lab. Would you reckon it'll be easier to just reinstall from scratch than try to fix this mess?

mikeinnyc · Mar 3, 2023

if you do then take your /etc folder along with your pools

butt3rballbeats · Mar 3, 2023

I ended up just nuking it and reinstalling after copying the data over. My zpool was still intact and I was able to rebuild a few vms from the disks that were still in pool. Ended up just having to start from scratch on the others oh well! I'll use this as a learning moment to get UPS and start scheduling daily backups

mikeinnyc · Mar 4, 2023

butt3rballbeats said:
I ended up just nuking it and reinstalling after copying the data over. My zpool was still intact and I was able to rebuild a few vms from the disks that were still in pool. Ended up just having to start from scratch on the others oh well! I'll use this as a learning moment to get UPS and start scheduling daily backups

all good now? make sure you update the time servers with each node.

Search

Search

No GUI after power loss

butt3rballbeats

New Member

mikeinnyc

Member

Cluster Cold Start

butt3rballbeats

New Member

mikeinnyc

Member

butt3rballbeats

New Member

mikeinnyc

Member

No GUI after power loss

butt3rballbeats

New Member

mikeinnyc

Member

Cluster Cold Start​

butt3rballbeats

New Member

mikeinnyc

Member

butt3rballbeats

New Member

mikeinnyc

Member

Cluster Cold Start