Situation:
I have one Proxmox 5.4-3 with a few VMs ("old"). I have another Proxmox 5.4-3 freshly set up ("new").
I created a cluster on "old" (glade) and wanted "new" (starsong) to join the cluster.
The moment that I trigger the join action on "new" (pasting the Join information and confirming), the new node shows up in the cluster on old, and then a second or so later old changes to "Standalone node - no cluster defined", the Create/Join buttons are active, Join information becomes greyed out, yet both cluster nodes are visible in the list.
On "new", login via PAM stops working then.
On new, I see this in the logs:
Old shows only:
I'm at a loss to debug - searching in the forum didn't yield any resolution. I tried deleting the cluster config on new, deleting the new node from old, reinstalling new, no change. I feel like I'm missing some obvious step but I can't figure it out. Any pointers would be appreciated.
I have one Proxmox 5.4-3 with a few VMs ("old"). I have another Proxmox 5.4-3 freshly set up ("new").
I created a cluster on "old" (glade) and wanted "new" (starsong) to join the cluster.
The moment that I trigger the join action on "new" (pasting the Join information and confirming), the new node shows up in the cluster on old, and then a second or so later old changes to "Standalone node - no cluster defined", the Create/Join buttons are active, Join information becomes greyed out, yet both cluster nodes are visible in the list.
On "new", login via PAM stops working then.
On new, I see this in the logs:
Code:
Apr 21 23:53:58 starsong pveproxy[12440]: worker exit
Apr 21 23:53:58 starsong pveproxy[4571]: worker 12440 finished
Apr 21 23:53:58 starsong pveproxy[4571]: starting 1 worker(s)
Apr 21 23:53:58 starsong pveproxy[4571]: worker 12467 started
Apr 21 23:53:58 starsong pveproxy[12467]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1683.
Old shows only:
Code:
Apr 22 01:43:41 glade pvedaemon[2121]: <root@pam> adding node starsong to cluster
Apr 22 01:43:41 glade pmxcfs[6967]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 6)
Apr 22 01:43:41 glade corosync[6992]: notice [CFG ] Config reload requested by node 1
Apr 22 01:43:41 glade corosync[6992]: [CFG ] Config reload requested by node 1
Apr 22 01:43:41 glade corosync[6992]: notice [QUORUM] This node is within the non-primary component and will NOT provid
Apr 22 01:43:41 glade corosync[6992]: notice [QUORUM] Members[1]: 1
Apr 22 01:43:41 glade corosync[6992]: [QUORUM] This node is within the non-primary component and will NOT provide any s
Apr 22 01:43:41 glade corosync[6992]: [QUORUM] Members[1]: 1
Apr 22 01:43:41 glade pmxcfs[6967]: [status] notice: node lost quorum
Apr 22 01:43:41 glade pmxcfs[6967]: [status] notice: update cluster info (cluster name galaxy, version = 6)
Apr 22 01:43:41 glade pve-ha-lrm[2199]: unable to write lrm status file - unable to open file '/etc/pve/nodes/glade/lrm_
Apr 22 01:44:00 glade systemd[1]: Starting Proxmox VE replication runner...
Apr 22 01:44:01 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:02 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:03 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:04 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:05 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:06 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:07 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:08 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:09 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 01:44:10 glade pvesr[12104]: error with cfs lock 'file-replication_cfg': no quorum!
Apr 22 01:44:10 glade systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Apr 22 01:44:10 glade systemd[1]: Failed to start Proxmox VE replication runner.
Apr 22 01:44:10 glade systemd[1]: pvesr.service: Unit entered failed state.
Apr 22 01:44:10 glade systemd[1]: pvesr.service: Failed with result 'exit-code'.
I'm at a loss to debug - searching in the forum didn't yield any resolution. I tried deleting the cluster config on new, deleting the new node from old, reinstalling new, no change. I feel like I'm missing some obvious step but I can't figure it out. Any pointers would be appreciated.