Trying to join cluster puts system into weird state

Discussion in 'Proxmox VE: Installation and configuration' started by moeffju, Apr 22, 2019.

  1. moeffju

    moeffju New Member

    Joined:
    Apr 22, 2019
    Messages:
    4
    Likes Received:
    0
    Situation:

    I have one Proxmox 5.4-3 with a few VMs ("old"). I have another Proxmox 5.4-3 freshly set up ("new").
    I created a cluster on "old" (glade) and wanted "new" (starsong) to join the cluster.
    The moment that I trigger the join action on "new" (pasting the Join information and confirming), the new node shows up in the cluster on old, and then a second or so later old changes to "Standalone node - no cluster defined", the Create/Join buttons are active, Join information becomes greyed out, yet both cluster nodes are visible in the list.
    On "new", login via PAM stops working then.

    On new, I see this in the logs:

    Code:
    Apr 21 23:53:58 starsong pveproxy[12440]: worker exit
    Apr 21 23:53:58 starsong pveproxy[4571]: worker 12440 finished
    Apr 21 23:53:58 starsong pveproxy[4571]: starting 1 worker(s)
    Apr 21 23:53:58 starsong pveproxy[4571]: worker 12467 started
    Apr 21 23:53:58 starsong pveproxy[12467]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1683.
    
    Old shows only:

    Code:
    Apr 22 01:43:41 glade pvedaemon[2121]: <root@pam> adding node starsong to cluster
    Apr 22 01:43:41 glade pmxcfs[6967]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 6)
    Apr 22 01:43:41 glade corosync[6992]: notice  [CFG   ] Config reload requested by node 1
    Apr 22 01:43:41 glade corosync[6992]:  [CFG   ] Config reload requested by node 1
    Apr 22 01:43:41 glade corosync[6992]: notice  [QUORUM] This node is within the non-primary component and will NOT provid
    Apr 22 01:43:41 glade corosync[6992]: notice  [QUORUM] Members[1]: 1
    Apr 22 01:43:41 glade corosync[6992]:  [QUORUM] This node is within the non-primary component and will NOT provide any s
    Apr 22 01:43:41 glade corosync[6992]:  [QUORUM] Members[1]: 1
    Apr 22 01:43:41 glade pmxcfs[6967]: [status] notice: node lost quorum
    Apr 22 01:43:41 glade pmxcfs[6967]: [status] notice: update cluster info (cluster name  galaxy, version = 6)
    Apr 22 01:43:41 glade pve-ha-lrm[2199]: unable to write lrm status file - unable to open file '/etc/pve/nodes/glade/lrm_
    Apr 22 01:44:00 glade systemd[1]: Starting Proxmox VE replication runner...
    Apr 22 01:44:01 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:02 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:03 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:04 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:05 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:06 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:07 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:08 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:09 glade pvesr[12104]: trying to acquire cfs lock 'file-replication_cfg' ...
    Apr 22 01:44:10 glade pvesr[12104]: error with cfs lock 'file-replication_cfg': no quorum!
    Apr 22 01:44:10 glade systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
    Apr 22 01:44:10 glade systemd[1]: Failed to start Proxmox VE replication runner.
    Apr 22 01:44:10 glade systemd[1]: pvesr.service: Unit entered failed state.
    Apr 22 01:44:10 glade systemd[1]: pvesr.service: Failed with result 'exit-code'.
    
    I'm at a loss to debug - searching in the forum didn't yield any resolution. I tried deleting the cluster config on new, deleting the new node from old, reinstalling new, no change. I feel like I'm missing some obvious step but I can't figure it out. Any pointers would be appreciated.
     
  2. moeffju

    moeffju New Member

    Joined:
    Apr 22, 2019
    Messages:
    4
    Likes Received:
    0
    This has been resolved by canceling the machine at ovh and moving to Hetzner.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice