Web GUI won't start - looks cluster related?

Chris Olive

New Member
Apr 8, 2020
20
0
1
44
Woke up this AM and my PVE Web GUI was down. Wouldn't start with a "can't update certs" issue. "No quorum"

Okay, well, I just established a cluster yesterday and then brought a cluster member down -- and it's probably not coming back. I didn't see any way to delete the cluster member, so I just left it. Seems maybe that wasn't a good idea. I guess behind the scenes, certs for Web GUI on both are shared, and with the cluster member gone, I can't do that and my Web GUI (on my now lone cluster member) went down? (Seems kinda non-resilient to me, but...)

So now, I'm wondering if I just delete the cluster member, I'm going to be good.

The first command is what service pveproxy restart seems to be failing on and that's the error (ie. "no quorum"):

Code:
root@pve:~# pvecm updatecerts
no quorum - unable to update files
root@pve:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve (local)
root@pve:~# pvecm delnode construct3
cluster not ready - no quorum?
root@pve:~# cd /etc/pve/nodes/
root@pve:/etc/pve/nodes# ll
total 0
dr-xr-xr-x 2 root www-data 0 May 17 20:25 construct3/
dr-xr-xr-x 2 root www-data 0 Apr  7 23:50 pve/

I found these instructions for deleting a cluster member, and I want to be absolutely sure if I do the last step here, I'm not going to bork something:

https://sysadmin-community.com/remove-node-from-cluster-proxmox/

Basically, as you can see from my output quoted above, cluster member "construct3" is now gone, and I'm thinking if I remove it permanently, then my service pveproxy restart is going to work?

Here's the service pveproxy restart and systemctl status pveproxy.service output as evidence for my theory. Not sure about the mkdir /etc/pve/ha error, but the rest seems to fit on pvecm updatecerts. Starting a cluster and the taking down a cluster member has to be what started this whole chain of events:

Code:
root@pve:/etc/pve/nodes# service pveproxy restart
Job for pveproxy.service failed because the control process exited with error code.
See "systemctl status pveproxy.service" and "journalctl -xe" for details.
root@pve:/etc/pve/nodes# systemctl status pveproxy.service | less
● pveproxy.service - PVE API Proxy Server
   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
   Active: activating (start) since Tue 2020-05-19 12:20:30 CDT; 613ms ago
  Process: 21210 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=1/FAILURE)
Cntrl PID: 21215 (pveproxy)
    Tasks: 1 (limit: 4915)
   Memory: 48.4M
   CGroup: /system.slice/pveproxy.service
           └─21215 /usr/bin/perl -T /usr/bin/pveproxy start

May 19 12:20:30 pve systemd[1]: Starting PVE API Proxy Server...
May 19 12:20:31 pve pvecm[21210]: mkdir /etc/pve/ha: Permission denied at /usr/share/perl5/PVE/Cluster.pm line 88.

So, (1) permanently delete cluster member (using command line approach) and (2) Web GUI should come back up?
 
Your remaining node has no quorum, which in a two node cluster needs to have both nodes online.



Yes, use the following guide: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_separate_node_without_reinstall

Well, this looks like a guide for executing on the node I took down without deleting the node from the cluster while it was still up and running. If it's easier to bring back "construct3" (as per my output in the OP), I can do that as a one-time thing to resolve on this.

I'll look into bringing it back and follow any guides I can find to find to properly remove it and see if that works. Again, the guide above looks like what I would do on the now missing cluster member? No? Do the above on the lone existing member?
 
Well, this looks like a guide for executing on the node I took down without deleting the node from the cluster while it was still up and running. If it's easier to bring back "construct3" (as per my output in the OP), I can do that as a one-time thing to resolve on this.

I'll look into bringing it back and follow any guides I can find to find to properly remove it and see if that works. Again, the guide above looks like what I would do on the now missing cluster member? No? Do the above on the lone existing member?

Ironically, the section above the one you reference speaks to removing a node from a cluster. It states:
  1. Take down the node (shutdown) and never bring it back up again.
  2. Then issue a pvecm delnode <oldnode> command.
That's essentially what happened to me. I took down the node without doing anything else. Yet when I issue a pvecm nodes command now (see OP), "construct3" isn't listed? That's all I did was shutdown the node?
 
I'll look into bringing it back and follow any guides I can find to find to properly remove it and see if that works. Again, the guide above looks like what I would do on the now missing cluster member? No? Do the above on the lone existing member?

Please actually read what I linked you, it works for your exact case.. Follow the steps on the leftover, still online, node...
 
Last edited:
You can just leave out the pvecm delnode oldnode command, as that one would be executed from your gone node. But as that node is gone, as you say, it obviously makes no sense to do so..
 
Please actually read what I linked you, it works for your exact case.. Follow the steps on the leftover, still online, node...

Confirmed this did the trick (as you already knew). Not sure why I read your original directive as to perform on the deleted cluster member.
 
Confirmed this did the trick (as you already knew).
Glad to hear.

Not sure why I read your original directive as to perform on the deleted cluster member.

I mean, it is written for the case where both, the parting node and the remaining cluster, are still online and available; this is the most general case after all. So it's understandable that it could induce some level of confusion if one, reading this the first time, takes it more literally.

In your situation one of those parts was gone (and actually it wouldn't have mattered much which part you used as POV) and thus the actions from that could be omitted.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!