Trying to join cluster puts system into weird state

moeffju · Apr 22, 2019

Situation:

I have one Proxmox 5.4-3 with a few VMs ("old"). I have another Proxmox 5.4-3 freshly set up ("new").
I created a cluster on "old" (glade) and wanted "new" (starsong) to join the cluster.
The moment that I trigger the join action on "new" (pasting the Join information and confirming), the new node shows up in the cluster on old, and then a second or so later old changes to "Standalone node - no cluster defined", the Create/Join buttons are active, Join information becomes greyed out, yet both cluster nodes are visible in the list.
On "new", login via PAM stops working then.

On new, I see this in the logs:

Code:

Apr 21 23:53:58 starsong pveproxy[12440]: worker exit
Apr 21 23:53:58 starsong pveproxy[4571]: worker 12440 finished
Apr 21 23:53:58 starsong pveproxy[4571]: starting 1 worker(s)
Apr 21 23:53:58 starsong pveproxy[4571]: worker 12467 started
Apr 21 23:53:58 starsong pveproxy[12467]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1683.

Old shows only:

Code:

Apr 22 01:43:41 glade pvedaemon[2121]: <root@pam> adding node starsong to cluster
Apr 22 01:43:41 glade pmxcfs[6967]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 6)
Apr 22 01:43:41 glade corosync[6992]: notice  [CFG   ] Config reload requested by node 1
Apr 22 01:43:41 glade corosync[6992]:  [CFG   ] Config reload requested by node 1
Apr 22 01:43:41 glade corosync[6992]: notice  [QUORUM] This node is within the non-primary component and will NOT provid
Apr 22 01:43:41 glade corosync[6992]: notice  [QUORUM] Members[1]: 1
Apr 22 01:43:41 glade corosync[6992]:  [QUORUM] This node is within the non-primary component and will NOT provide any s
Apr 22 01:43:41 glade corosync[6992]:  [QUORUM] Members[1]: 1
Apr 22 01:43:41 glade pmxcfs[6967]: [status] notice: node lost quorum
Apr 22 01:43:41 glade pmxcfs[6967]: [status] notice: update cluster info (cluster name  galaxy, version = 6)
Apr 22 01:43:41 glade pve-ha-lrm[2199]: unable to write lrm status file - unable to open file '/etc/pve/nodes/glade/lrm_
Apr 22 01:44:00 glade systemd[1]: Starting Proxmox VE replication runner...
Apr 22 01:44:01 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:02 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:03 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:04 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:05 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:06 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:07 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:08 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:09 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:10 glade pvesr[12104]: error with cfs lock 'file-replication_cfg': no quorum!
Apr 22 01:44:10 glade systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Apr 22 01:44:10 glade systemd[1]: Failed to start Proxmox VE replication runner.
Apr 22 01:44:10 glade systemd[1]: pvesr.service: Unit entered failed state.
Apr 22 01:44:10 glade systemd[1]: pvesr.service: Failed with result 'exit-code'.

I'm at a loss to debug - searching in the forum didn't yield any resolution. I tried deleting the cluster config on new, deleting the new node from old, reinstalling new, no change. I feel like I'm missing some obvious step but I can't figure it out. Any pointers would be appreciated.

moeffju · May 19, 2019

This has been resolved by canceling the machine at ovh and moving to Hetzner.

okieunix1957 · Feb 21, 2020

moeffju said:
This has been resolved by canceling the machine at ovh and moving to Hetzner.

Can you explain that in more detail? How did you cancel the machine then move it?

Search

Search

Trying to join cluster puts system into weird state

moeffju

Member

moeffju

Member

okieunix1957

Member

We value your privacy