[SOLVED] Failure to join cluster

mdub11

New Member
Oct 24, 2022
2
0
1
Hi,

I currently have three nodes :
  • PVE01 : 192.168.53.222 (v 7.2)
  • PVE02 : 192.168.53.219 (v7.2)
  • PVE03 : 192.168.1.254 (v6.4 because there's an existing third party software on the machine requiring debian buster. Not important since the node just exists for quorum.)
PVE01 and PVE03 have been in a cluster for weeks and ran without issue until now.

The problem happened when I tried to add PVE02 to the cluster via the WEBUI : the operation got stuck while calling the cluster API (no error shown). I eventually stopped the operation after waiting for 10 minutes and since then, I can't login to PVE02 via WEBUI (login failed) and I get lost communication errors on PVE01 and PVE03 via WEBUI.

I can still connect to all three via ssh though.

I have VMs running on PVE01 which appear to still run correctly though I can't check as any command I run on PVE01 via ssh give me no output (they're stuck).

On PVE02 I ran "pvecm status"
Code:
Cluster information -------------------
Name:             clustername
Config Version:   3
Transport:        knet
Secure auth:      on


Quorum information
------------------
Date:             Mon Oct 24 09:15:06 2022
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000003
Ring ID:          3.15dc
Quorate:          No


Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:          


Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.53.219 (local)

and the content of /etc/pve/corosync.conf on PVE02 :


Code:
logging {   debug: off
  to_syslog: yes
}


nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.53.222
  }
  node {
    name: pve02
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.53.219
  }
  node {
    name: sbc
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.254
  }
}


quorum {
  provider: corosync_votequorum
}


totem {
  cluster_name: clustername
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2

}

Here are system logs on PVE02 :

Code:
Oct 24 09:17:01 pve02 cron[2196]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
Oct 24 09:17:01 pve02 CRON[42481]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 24 09:17:01 pve02 CRON[42482]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Oct 24 09:17:01 pve02 CRON[42481]: pam_unix(cron:session): session closed for user root
Oct 24 09:17:01 pve02 pvescheduler[42485]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Oct 24 09:17:01 pve02 pvescheduler[42484]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Oct 24 09:17:03 pve02 corosync[12423]:   [QUORUM] Sync members[1]: 3
Oct 24 09:17:03 pve02 corosync[12423]:   [TOTEM ] A new membership (3.1684) was formed. Members
Oct 24 09:17:03 pve02 corosync[12423]:   [QUORUM] Members[1]: 3
Oct 24 09:17:03 pve02 corosync[12423]:   [MAIN  ] Completed service synchronization, ready to provide service.

And the output of "pvesh get /cluster/config/join --output-format json-pretty" on PVE02 :

Code:
'/etc/pve/nodes/pve01/pve-ssl.pem' does not exist!

I realized that /etc/pve/nodes doesn't exist on PVE02 and that /etc/pve is write protected.


What can I do to fix this ?
 
Last edited:
Problem solved !

I managed to take PVE02 out of the cluster by running "pvecm expected 2" and "pvecm delnode pve02" on PVE01.

Then I reinstalled PVE02 since it still considered itself part of the cluster. After that, I rejoined it to the cluster via command line "pvecm add PVE01" and it worked.

The initial issue was due to missing firewall rules allowing traffic between PVE02 and PVE03 (port 8006/TCP, 22/TCP and 5405/UDP for corosync)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!