[SOLVED] unable to get Web UI up on 3rd host that has been offline for 6 months~

supersloth

Member
Aug 25, 2022
6
1
8
I need some help please.

I am running my 3 host proxmox cluster on some Intel i5-13600k cpu that had instability issues.
One host was shut off in August or some such as it was just so broken.
I got an RMA recently on the CPU and its an i5-14600k this time, which shouldnt impact this working, but for context...

I am able to SSH to this 3rd node, no ip or other changes. Hosts file is ok, network is ok otherwise.
But I notice both pveproxy, pve-guests and pve-manager services will not start up at all. I am not able to manually start them up either. It will sit there and not error out.

Bash:
root@scaramanga:/var/log/pve/tasks# systemctl status pveproxy
○ pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; preset: enabled)
     Active: inactive (dead)
root@scaramanga:/var/log/pve/tasks# systemctl status pve-manager
○ pve-guests.service - PVE guests
     Loaded: loaded (/lib/systemd/system/pve-guests.service; enabled; preset: enabled)
     Active: inactive (dead)
root@scaramanga:/var/log/pve/tasks# systemctl status pve-guests
○ pve-guests.service - PVE guests
     Loaded: loaded (/lib/systemd/system/pve-guests.service; enabled; preset: enabled)
     Active: inactive (dead)

The other 2 hosts are online and working, I get a code 595 when I try to access this one from the UI, it shows up green, corosync appears to be working.
I have CIFS share and its connected fine, reboots attempted, all 3 hosts should have been unchanged as I did no maintenance, knowing ill finally get a chance to bring the last host back up.

Any help please, where do i go, what can i try please.
Thanks!

pveversion (same on all 3 hosts)
Code:
root@scaramanga:/var/log/pve/tasks# pveversion
pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-4-pve)

journalctl of pve-cluster, the only interesting bit of log file information I can find.
Bash:
root@scaramanga:/var/log/pve/tasks# journalctl -b -u pve-cluster
Feb 17 20:00:07 scaramanga systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Feb 17 20:00:07 scaramanga pmxcfs[2117]: [main] notice: resolved node name 'scaramanga' to '192.168.7.23' for default node IP address
Feb 17 20:00:07 scaramanga pmxcfs[2117]: [main] notice: resolved node name 'scaramanga' to '192.168.7.23' for default node IP address
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [quorum] crit: quorum_initialize failed: 2
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [quorum] crit: can't initialize service
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [confdb] crit: cmap_initialize failed: 2
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [confdb] crit: can't initialize service
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [dcdb] crit: cpg_initialize failed: 2
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [dcdb] crit: can't initialize service
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [status] crit: cpg_initialize failed: 2
Feb 17 20:00:07 scaramanga pmxcfs[2142]: [status] crit: can't initialize service
Feb 17 20:00:08 scaramanga systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Feb 17 20:00:14 scaramanga pmxcfs[2142]: [status] notice: update cluster info (cluster name  spectre, version = 11)
Feb 17 20:00:14 scaramanga pmxcfs[2142]: [dcdb] notice: members: 3/2142
Feb 17 20:00:14 scaramanga pmxcfs[2142]: [dcdb] notice: all data is up to date
Feb 17 20:00:14 scaramanga pmxcfs[2142]: [status] notice: members: 3/2142
Feb 17 20:00:14 scaramanga pmxcfs[2142]: [status] notice: all data is up to date
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: members: 1/2043, 2/2092, 3/2142
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: starting data syncronisation
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [status] notice: members: 1/2043, 2/2092, 3/2142
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [status] notice: starting data syncronisation
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [status] notice: node has quorum
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: received sync request (epoch 1/2043/00000015)
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [status] notice: received sync request (epoch 1/2043/00000015)
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: received all states
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: leader is 1/2043
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: synced members: 1/2043, 2/2092
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: waiting for updates from leader
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [status] notice: received all states
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [status] notice: all data is up to date
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: update complete - trying to commit (got 3 inode updates)
Feb 17 20:00:15 scaramanga pmxcfs[2142]: [dcdb] notice: all data is up to date
Feb 17 20:00:40 scaramanga pmxcfs[2142]: [status] notice: received log
Feb 17 20:11:24 scaramanga pmxcfs[2142]: [dcdb] notice: data verification successful
Feb 17 20:12:44 scaramanga pmxcfs[2142]: [status] notice: received log
Feb 17 20:27:44 scaramanga pmxcfs[2142]: [status] notice: received log
 
Last edited:
Hello supersloth! Can you please attach the full output of journalctl --boot > syslog.txt? I'm wondering whether you have any other errors caused by either CPU or RAM instability.

Anyway, as you are probably aware, the CPU is from a generation that had several issues. While this might not necessarily fix your current issues, for long-term stability and to prevent further CPU-related issues, you might want to consider:
  1. Updating the motherboard BIOS to the latest version (which will also contain updates to the CPU microcode).
  2. Enable CPU microcode updates in Proxmox VE, which also requires you to enable the correct Debian repository and install the CPU-vendor specific microcode package. This also ensures that you get the latest microcode updates even if the motherboard manufacturer does not release BIOS updates (or not right away). Don't forget to restart after this step.
 
Last edited:
Hello supersloth! Can you please attach the full output of `journalctl --boot > syslog.txt`? I'm wondering whether you have any other errors caused by either CPU or RAM instability.

Anyway, as you are probably aware, the CPU is from a generation that had several issues. While this might not necessarily fix your current issues, for long-term stability and to prevent further CPU-related issues, you might want to consider:
  1. Updating the motherboard BIOS to the latest version (which will also contain updates to the CPU microcode).
  2. Enable CPU microcode updates in Proxmox VE, which also requires you to enable the correct Debian repository and install the CPU-vendor specific microcode package. This also ensures that you get the latest microcode updates even if the motherboard manufacturer does not release BIOS updates (or not right away). Don't forget to restart after this step.
Thanks l.leahu-vladucu, please find the file attached.

The BIOS is updated to the latest available for the mobo (Asus W680 IPMI) and the microcode is latest from factory, but very much aware of it!
Will enable microcode updates however, its been on my list of figuring out how to.

Cheers!
 

Attachments

Thanks for the syslog!

Will enable microcode updates however, its been on my list of figuring out how to.
The PVE docs I linked to should help :) I'm not sure if there's a newer microcode than the one provided by your BIOS, but this could help in the future when there is one.

Anyway, you are right, the only interesting stuff in the syslog is the part you posted.

Some things that you can try:
  1. Before continuing, could you please try to update your system using apt update && apt dist-upgrade?
  2. Afterwards, if you still get errors, it would be interesting to see if you get any errors when you try to start the services manually.
  3. Last but not least, you might want to try to run memtest86+ to see if you have RAM issues.
 
Thank you l.leahu-vladucu, I did the apt update && apt dist-upgrade and that seems to have done the trick somehow.
Im so confused why!?

I am now going to verify, hopefully I can access all of the VM that were on here and transfer them off Linstor.
I found Linstor to be too high maintenance for my liking compared to CEPH, even tho I love it much more from a functionality perspective sadly.

Actually, I ran 5 loops of memtest64+ the moment I confirmed bios and microcode were latest after first boot :D

Thank you so much for the help, i will monitor this for a little bit and close it out, but it certainly looks like I am restored for now :)