Adding cluster getting stuck after Waiting for quorum...OK

sheshman · Aug 26, 2023

Hi,

Both nodes are 8.0.3 and fully updated, trying to create cluster through cli but each time it's getting stuck Waiting for quorum...OK.

I've waited over 2 hours to see if it needs time to complete

but it wasn't the case.

When i terminate cluster with :

Code:

systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
killall pmxcfs
systemctl start pve-cluster
pvecm delnode oldnode
rm /var/lib/corosync/*

it's exiting the process (obviously)

--All nodes can ping eachother(both ip and fqdn)
--All nodes can ssh to eachother(both ip and fqdn)
--Nodes are not behind NAT
--both node defined in /etc/hosts
--tried to create cluster with both ip and hostname ,result was the same

corosync.conf as below;

Code:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: helsinki
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 65.21.27.202
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: egeclst012
  config_version: 1
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

i've created Syslog.txt with below command;

Code:

journalctl --since "2023-08-25 05:30" --until "2023-08-26 09:00" > /tmp/Syslog.txt

and attached to post

pveversion -v output as below;

Code:

proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

I've tried every solution i found but no luck so far, any advice will be appreciated.

Anotheruser · Aug 26, 2023

Did you try adding the nodes via the webui?
Just to mention it, if you have a cluster of two severs, each time a server goes offline, the other one will go into no quorum / protective state, bc it does not know if it is the one thats offline or the other one, there are tons of post on this topic.

You can try to overrule quorum with the command pvecm expect 1 or pvecm expect 2 on the node thats giving you the error, however this can cause problems so use with caution

sheshman · Aug 26, 2023

Anotheruser said:
Did you try adding the nodes via the webui?
Just to mention it, if you have a cluster of two severs, each time a server goes offline, the other one will go into no quorum / protective state, bc it does not know if it is the one thats offline or the other one, there are tons of post on this topic.

You can try to overrule quorum with the command pvecm expect 1 or pvecm expect 2 on the node thats giving you the error, however this can cause problems so use with caution

Yes i've already tried to add through webui, and it's also getting stuck (i forgot what was the last message before it get stuck), node's web access is going down (i mean it's not accepting your password anymore), on the main cluster node you can see it's online but when you click to it it says

Code:

/etc/pve/nodes/merkez/pve-ssl.pem' does not exist!

when i check /etc/pve/nodes/merkez there are no .pem file

Honestly i don't think it's quorum issue because it says OK, not getting stuck on waiting quorum, but i'm a rookie after all, maybe that's the problem.

Anotheruser · Aug 26, 2023

you can try to generate new certs with
pvecm updatecerts --force

sheshman · Aug 26, 2023

Anotheruser said:
you can try to generate new certs with
pvecm updatecerts --force

i've tried this method on both hosts, but didn't help

Code:

rm /etc/pve/priv/pve-root*
pvecm updatecerts --force
systemctl restart pveproxy.service

is it ok? or should i just run the command your provided?

Anotheruser · Aug 26, 2023

ok wait on which node did you get the
/etc/pve/nodes/merkez/pve-ssl.pem' does not exist! error,
on the node named merkez or onthe other one?

sheshman · Aug 26, 2023

Anotheruser said:
ok wait on which node did you get the
/etc/pve/nodes/merkez/pve-ssl.pem' does not exist! error,
on the node named merkez or onthe other one?

1st(main system) - helsinki
2nd system - merkez

just now tried to add merkez to helsinki as cluster and it's stuck on "Request addition of this node".

On helsinki it shows merkez as online, when i go to datacenter->cluster it says "'/etc/pve/nodes/merkez/pve-ssl.pem' does not exist! (500)" and merkez's webgui is now returning ERR_TIMED_OUT, screenshots as attached

kikibase · May 17, 2024

sheshman said:
1st(main system) - helsinki
2nd system - merkez

just now tried to add merkez to helsinki as cluster and it's stuck on "Request addition of this node".

On helsinki it shows merkez as online, when i go to datacenter->cluster it says "'/etc/pve/nodes/merkez/pve-ssl.pem' does not exist! (500)" and merkez's webgui is now returning ERR_TIMED_OUT, screenshots as attached

Did you ever get this fixed I've been trying to troubleshoot this for over a week and I've reinstalled proxmox on both servers multiple times at this point

tomas.kuba · Apr 14, 2025

kikibase said:
Did you ever get this fixed I've been trying to troubleshoot this for over a week and I've reinstalled proxmox on both servers multiple times at this point

For anyone who could find it helpful - this is what helped me in a situation when a freshly installed node refused to join the cluster (or was refused by the cluster).

This is only a short version which might work. I've executed more commands so it is possible that this will be not enough. Please take it only as a hint.

On the new node:

Bash:

systemctl stop pve-cluster corosync
pmxcfs -l
pvecm updatecerts --force
scp -r /etc/pve/local/* <oldnode>:/etc/pve/nodes/<newnode>/
scp -r <oldnode>:/etc/pve/auth* /etc/pve/
scp -r <oldnode>:/etc/pve/pve-* /etc/pve/
systemctl restart pve-cluster corosync pveproxy pve-ha*

(rename <oldnode> and <newnode> to your values of course)

This stops the corosync and forces the local mode, regenerates the certs on the new node, transfers the certificates of the new node to the cluster, transfers the cluster root certificates to the new node and (re)starts the impacted services.
It is important to restart pveproxy as it serves the html frontend. Not sure whether it is needed to restart the pve-ha-crm and pve-ha-lrm services (there were some "ha" errors in the log) but it doesn't hurt and is faster than restarting the whole node.

Then, on the old node:

Bash:

systemctl restart pveproxy

One more thing that could help you. If you won't be able to start the corosync service, try to stop corosync and pve-cluster and run pmxcfs in the foreground

Bash:

systemctl stop pve-cluster corosync
pmxcfs -f

In case it runs without error that way, stop it (^C), edit /lib/systemd/system/corosync.service and change
Type=notify
to
Type=simple
then

Bash:

systemctl daemon-reload
systemctl start pve-cluster corosync

It helped me in a situation when nothing else worked. After the corosync started, I was able to copy the certs and in the end, I returned the original value Type=notify and it worked. Maybe someone will figure out why it worked.

This helped me a lot to understand where the problem could be: Proxmox Cluster file system (pmxcfs)

Search

Search

Adding cluster getting stuck after Waiting for quorum...OK

sheshman

Member

Attachments

Anotheruser

Member

sheshman

Member

Anotheruser

Member

sheshman

Member

Anotheruser

Member

sheshman

Member

Attachments

kikibase

New Member

tomas.kuba

Member

We value your privacy