Help to fix Proxmox access issues after power cut

Rollux

New Member
Mar 7, 2026
4
0
1
Hi all, long time lurker, first time poster.
Please bear with me a bit, I am seeking assistance to regain access to a proxmox instance - I am a noob as far as Linux and proxmox goes, just playing around with a home lab.

I have 2 old PCs repurposed for home servers (only because I need more RAM do I have 2). I played around with putting them in a cluster, successfully so I thought, and had everything running well for several months.
We had a power cut, and as I have no UPS as yet, the obvious happened. However, something broke in the cluster setup. (FYI - I had rebooted both nodes previously without issue)
What I now can/cannot do:
1. I cannot access Proxmox VE for the "main" server via SSH or web GUI - I get time out errors. However, all VMs and containers are accessible and working as expected.
2. I can access the "secondary" node via its web GUI and SSH - it simply shows the "main" as offline. However, to get the containers and VMs started I need to SSH and advise 'pvecm expected 1' so this node obviously still needs quorum
3. I can ping the "main" server and get a response.

As my main server runs TrueNAS and Immich I don't wish to just wipe it and start again - recovery is preferable but I am at a loss as to how to proceed, despite searching for a few weeks thru various threads on here, reddit and so on.

Any assistance/guidance is much appreciated.
 
Hi Rollux,
you could probably try a local login, either via connected screen and keyboard or,
if your server hardware supports it, via out of band management (ILO/IDRAC/BMC ...)

What error message do you get, when you try to connect via ssh -vvv ?

You need to be very careful with changing the expected votes,
this can lead to data loss/split brain for your VM config files and probably also disk data (depending on the setup of the storages),
because they might get overwritten on syncing back to the server, that runs the VMs.
The currently running VM-Processes are not affected directly by that, so do not get confused about that.

A cluster with only two nodes is very fragile, in therms of corosync votes,
because every time one node fails, both nodes will fence themself.
By the reason you also have the Truenas available, I would heavily consider that one
to provide a third vote via qdevice (https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support)

BR, Lucas
 
  • Like
Reactions: networkguy3
I don't have a “magic bullet” here. But if you can ping it, it seems that at least the network stack is running. To troubleshoot further, I'd say you would need to connect a monitor and keyboard and then see if it gets to the login prompt at startup.

If so, log in as root and check the system logs. If not, see where it stops during the boot process or if you get any errors in the startup logs (the messages that are displayed during the boot process).

EDIT: @bl1mp was faster.
 
Last edited:
Thanks for the replies.
So I connected monitor and keyboard - can log in ok. But I don't know what to look at or for now. (again, I'm new to this so please be patient)
I checked journalctl -b and only red flags were for a failed CIFS mount (for an LXC) and a Radeon Secure display failure.
I also checked /var/log/pveproxy/ and several of the access logs - can see the various attempts from the IP addresses of the devices on ly network to try and connect

What and where else do I need to look at?
 
Hi Rollux,
by the reason, one of your problems includes broken ssh connectivity, my first approach would be to check the logs for the ssh service.
This might help to enable remote access again as well as finding probable indications for further problems.

Then you should check if the pveproxy and the pvedaemon are running, as they provide the access to the webui.

You can check this either via
Code:
systemctl status {sshd,pvedaemon,pveproxy}

or by using
Code:
ss -tapen
(the successor of netstat) this should list the open network ports of the services, if they are available.

Also you should check in which cluster conditions your node considers itself, before you try fixing stuff.

BR, Lucas
 
Hi Rollux,
by the reason, one of your problems includes broken ssh connectivity, my first approach would be to check the logs for the ssh service.
This might help to enable remote access again as well as finding probable indications for further problems.

Then you should check if the pveproxy and the pvedaemon are running, as they provide the access to the webui.

You can check this either via
Code:
systemctl status {sshd,pvedaemon,pveproxy}

or by using
Code:
ss -tapen
(the successor of netstat) this should list the open network ports of the services, if they are available.

Also you should check in which cluster conditions your node considers itself, before you try fixing stuff.

BR, Lucas
Thanks Lucas
For the systemctl status I get folder and service addresses, and PIDs for the 3 services you have listed
For ss-tapen I get as follows;
sshd 0.0.0.0:22
pvedaemon 127.0.0.1:85
pveproxy *:8006

Using pvesh get /cluster/status I get a table
IDNAMETYPEIPLEVEllocalnode idnodesonlinequorate
node/dj3ddj3dnode192.168.1.152101

My ssh logs from journalctl -u ssh.service give an error
pam_systemd(ssd:session): Failed to create session: Refusing activation. D-bus is shutting down.
I have that about a dozen times from various attempts, always after accepted password, and session opened for user.

I appreciate the help and guidance so far - looks like this is a curly one as my googling and searching these forums is telling me it should be working.
 
Don't know your exact setup - but skimming this post you maybe suffering a PVE root fill-up which could cause the symptoms you describe.

Please post output for:
Code:
df -h /

(My thinking is, that a failed CIFS/NFS mount etc. has caused a backup etc. to be locally stored & thus filled up the root).
 
Hi Rollux,
in general your sshd instance seems to be ok/running..
Which user do you use for the login and which command is listed for that user in /etc/passwd?
Do you remember any changes done, before the last reboot?

You can of course increase the log level of the ssh server by adding
Code:
LogLevel DEBUG3
to /etc/ssh/sshd_config for troubleshooting. A restart of the ssh server is required afterwards.

And probably test, if the same problem occure with another users ssh login.

BR, Lucas
 
Seeing you have shown that you have plenty of disk space available in root, I would now check available RAM on the node as insufficient memory can also cause systemd-logind to fail.