Adding node to cluster failed... sort of.

May 30, 2021
5
0
1
49
Hello I have two PCs that I have installed Proxmox on. The gui shows version 7.1-7. On one node I used the GUI to create a cluster, I joined the cluster from the 2nd node.

The 2nd node showed up in the GUI pretty quickly but never turned green. I have rebooted both nodes and still no luck (it does sometimes turn green; but then it times out whenever I click on anything related to that node.

Poking around it seems that all actions, like doing an `ls` on the /etc/pve filesystem can take upto 30 minutes to complete. I've searched the forum and documentation and I can't seem to find any clues as to how to fix this or to see what the underlying issue might be.

Here's my corosync.conf:
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve00
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.76.1.0
  }
  node {
    name: pve01
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.76.1.1
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: home-cluster
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

The ip 10.76.1.0 might look odd but I (perhaps foolishly) setup my home lab as a /16 long ago and haven't gotten around to doing a better job. I have a 3rd system I want to add to the cluster; but I'm waiting until this is resolved. I have already attempted to reinstall proxmox on both of these nodes and I got the same result when adding the cluster.

Any help would be appreciated.
 
Here's my corosync.conf:
Does it look the same on both nodes? I.e. the /etc/corosync/corosync.conf?
The ip 10.76.1.0 might look odd but I (perhaps foolishly) setup my home lab as a /16 long ago and haven't gotten around to doing a better job. I have a 3rd system I want to add to the cluster; but I'm waiting until this is resolved. I have already attempted to reinstall proxmox on both of these nodes and I got the same result when adding the cluster.
And the cluster join task did not contain any error/warning whatsoever?

A few commands would help to get a better overview:
Code:
pvecm status
ip addr
systemctl status corosync pve-cluster pveproxy pvedaemon pvestatd
 
Hello and thank you for replying.

/etc/corosync/corosync.conf looks the same on both hosts.

The cluster join task gave the following message:
Dec 26 21:58:41Dec 26 22:05:40pve01root@pamJoin ClusterError: unable to create directory '/etc/pve/nodes' - Permission denied

Looking at the permissions on both nodes, it seems they match what the documentation says it should be.

Code:
root@pve00:~# ls -ld /etc/pve
drwxr-xr-x 2 root www-data 0 Dec 31  1969 /etc/pve
root@pve00:~# ls -l /etc/pve/
total 4
-rw-r----- 1 root www-data  451 Dec 26 21:25 authkey.pub
-rw-r----- 1 root www-data  444 Dec 26 21:58 corosync.conf
-rw-r----- 1 root www-data   16 Dec 26 21:24 datacenter.cfg
drwxr-xr-x 2 root www-data    0 Dec 26 21:25 ha
lrwxr-xr-x 1 root www-data    0 Dec 31  1969 local -> nodes/pve00
lrwxr-xr-x 1 root www-data    0 Dec 31  1969 lxc -> nodes/pve00/lxc
drwxr-xr-x 2 root www-data    0 Dec 26 21:25 nodes
lrwxr-xr-x 1 root www-data    0 Dec 31  1969 openvz -> nodes/pve00/openvz
drwx------ 2 root www-data    0 Dec 26 21:25 priv
-rw-r----- 1 root www-data 2074 Dec 26 21:25 pve-root-ca.pem
-rw-r----- 1 root www-data 1675 Dec 26 21:25 pve-www.key
lrwxr-xr-x 1 root www-data    0 Dec 31  1969 qemu-server -> nodes/pve00/qemu-server
drwxr-xr-x 2 root www-data    0 Dec 26 21:25 sdn
-rw-r----- 1 root www-data  127 Dec 26 21:24 storage.cfg
-rw-r----- 1 root www-data   39 Dec 26 21:24 user.cfg
drwxr-xr-x 2 root www-data    0 Dec 26 21:25 virtual-guest
-rw-r----- 1 root www-data  119 Dec 26 21:25 vzdump.cron


Code:
root@pve01:~# ls -ld /etc/pve
drwxr-xr-x 2 root www-data 0 Dec 31  1969 /etc/pve
root@pve01:~# ls -l /etc/pve/
total 1
-rw-r----- 1 root www-data 444 Dec 26 21:58 corosync.conf
drwxr-xr-x 2 root www-data   0 Dec 26 22:41 ha
lrwxr-xr-x 1 root www-data   0 Dec 31  1969 local -> nodes/pve01
lrwxr-xr-x 1 root www-data   0 Dec 31  1969 lxc -> nodes/pve01/lxc
drwxr-xr-x 2 root www-data   0 Dec 26 22:34 nodes
lrwxr-xr-x 1 root www-data   0 Dec 31  1969 openvz -> nodes/pve01/openvz
drwx------ 2 root www-data   0 Dec 26 21:58 priv
lrwxr-xr-x 1 root www-data   0 Dec 31  1969 qemu-server -> nodes/pve01/qemu-server
drwxr-xr-x 2 root www-data   0 Dec 26 22:05 virtual-guest

Here's the out put of the requested commands from both nodes:
(The forum complained that my message was too long so I'm breaking each nodes output into separate posts)
 
Requested command output for the first node: pve00
Code:
root@pve00:~# pvecm status
Cluster information
-------------------
Name:             home-cluster
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Dec 27 08:05:12 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.1c9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.76.1.0 (local)
0x00000002          1 10.76.1.1
root@pve00:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp40s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2c:f0:5d:a9:ba:f6 brd ff:ff:ff:ff:ff:ff
3: enp42s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2c:f0:5d:a9:ba:f5 brd ff:ff:ff:ff:ff:ff
4: enp33s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether f0:2f:74:70:0b:22 brd ff:ff:ff:ff:ff:ff
5: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f0:2f:74:70:0b:22 brd ff:ff:ff:ff:ff:ff
    inet 10.76.1.0/16 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::f22f:74ff:fe70:b22/64 scope link
       valid_lft forever preferred_lft forever
root@pve00:~# systemctl status corosync pve-cluster pveproxy pvedaemon pvestatd
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-12-26 23:16:48 CST; 8h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 1150 (corosync)
      Tasks: 9 (limit: 154421)
     Memory: 1.2G
        CPU: 5min 12.918s
     CGroup: /system.slice/corosync.service
             └─1150 /usr/sbin/corosync -f

Dec 27 08:05:53 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f f0
Dec 27 08:05:53 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f f0
Dec 27 08:05:53 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f f0
Dec 27 08:05:53 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f f0
Dec 27 08:05:54 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f f0
Dec 27 08:05:54 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f f0
Dec 27 08:05:54 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f f0
Dec 27 08:05:54 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f
Dec 27 08:05:55 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f
Dec 27 08:05:55 pve00 corosync[1150]:   [TOTEM ] Retransmit List: c d 12 15 16 17 19 1a 8f

● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-12-26 23:16:48 CST; 8h ago
    Process: 1020 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 1040 (pmxcfs)
      Tasks: 9 (limit: 154421)
     Memory: 64.6M
        CPU: 11.211s
     CGroup: /system.slice/pve-cluster.service
             └─1040 /usr/bin/pmxcfs

Dec 27 08:01:07 pve00 pmxcfs[1040]: [dcdb] notice: received sync request (epoch 1/1040/00000060)
Dec 27 08:01:07 pve00 pmxcfs[1040]: [status] notice: received sync request (epoch 1/1040/0000005B)
Dec 27 08:01:07 pve00 pmxcfs[1040]: [dcdb] notice: received all states
Dec 27 08:01:07 pve00 pmxcfs[1040]: [dcdb] notice: leader is 1/1040
Dec 27 08:01:07 pve00 pmxcfs[1040]: [dcdb] notice: synced members: 1/1040
Dec 27 08:01:07 pve00 pmxcfs[1040]: [dcdb] notice: start sending inode updates
Dec 27 08:01:07 pve00 pmxcfs[1040]: [dcdb] notice: sent all (38) updates
Dec 27 08:01:07 pve00 pmxcfs[1040]: [dcdb] notice: all data is up to date
Dec 27 08:01:07 pve00 pmxcfs[1040]: [status] notice: received all states
Dec 27 08:01:07 pve00 pmxcfs[1040]: [status] notice: all data is up to date

● pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-12-26 23:16:50 CST; 8h ago
    Process: 1206 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
    Process: 1208 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
   Main PID: 1210 (pveproxy)
      Tasks: 4 (limit: 154421)
     Memory: 155.2M
        CPU: 11.666s
     CGroup: /system.slice/pveproxy.service
             ├─ 1210 pveproxy
             ├─ 7058 pveproxy worker
             ├─16007 pveproxy worker
             └─70912 pveproxy worker

Dec 27 01:04:12 pve00 pveproxy[1210]: worker 1213 finished
Dec 27 01:04:12 pve00 pveproxy[1210]: starting 1 worker(s)
Dec 27 01:04:12 pve00 pveproxy[1210]: worker 16007 started
Dec 27 07:53:07 pve00 pveproxy[10955]: Can't call method "push_write" on an undefined value at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 364.
Dec 27 07:53:12 pve00 pveproxy[10955]: worker exit
Dec 27 07:53:12 pve00 pveproxy[1210]: worker 10955 finished
Dec 27 07:53:12 pve00 pveproxy[1210]: starting 1 worker(s)
Dec 27 07:53:12 pve00 pveproxy[1210]: worker 70912 started
Dec 27 07:53:27 pve00 pveproxy[16007]: '/etc/pve/nodes/pve01/pve-ssl.pem' does not exist!
Dec 27 07:53:27 pve00 pveproxy[7058]: '/etc/pve/nodes/pve01/pve-ssl.pem' does not exist!

● pvedaemon.service - PVE API Daemon
     Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-12-26 23:16:49 CST; 8h ago
    Process: 1171 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
   Main PID: 1201 (pvedaemon)
      Tasks: 4 (limit: 154421)
     Memory: 154.4M
        CPU: 5.661s
     CGroup: /system.slice/pvedaemon.service
             ├─1201 pvedaemon
             ├─1202 pvedaemon worker
             ├─1203 pvedaemon worker
             └─1204 pvedaemon worker

Dec 26 23:25:46 pve00 pvedaemon[1202]: <root@pam> successful auth for user 'root@pam'
Dec 26 23:40:46 pve00 pvedaemon[1203]: <root@pam> successful auth for user 'root@pam'
Dec 26 23:55:46 pve00 pvedaemon[1203]: <root@pam> successful auth for user 'root@pam'
Dec 27 00:10:46 pve00 pvedaemon[1202]: <root@pam> successful auth for user 'root@pam'
Dec 27 00:25:46 pve00 pvedaemon[1203]: <root@pam> successful auth for user 'root@pam'
Dec 27 00:40:46 pve00 pvedaemon[1204]: <root@pam> successful auth for user 'root@pam'
Dec 27 00:55:46 pve00 pvedaemon[1203]: <root@pam> successful auth for user 'root@pam'
Dec 27 07:52:52 pve00 IPCC.xs[1204]: pam_unix(proxmox-ve-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=  user=root
Dec 27 07:52:54 pve00 pvedaemon[1204]: authentication failure; rhost=::ffff:10.76.150.43 user=root@pam msg=Authentication failure
Dec 27 07:53:12 pve00 pvedaemon[1202]: <root@pam> successful auth for user 'root@pam'

● pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-12-26 23:16:48 CST; 8h ago
    Process: 1154 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
   Main PID: 1170 (pvestatd)
      Tasks: 1 (limit: 154421)
     Memory: 107.1M
        CPU: 1min 45.449s
     CGroup: /system.slice/pvestatd.service
             └─1170 pvestatd

Dec 26 23:16:48 pve00 systemd[1]: Starting PVE Status Daemon...
Dec 26 23:16:48 pve00 pvestatd[1170]: starting server
Dec 26 23:16:48 pve00 systemd[1]: Started PVE Status Daemon.
 
Last edited:
Requested command output for the second node: pve01
root@pve01:~# pvecm status
Cluster information
-------------------
Name: home-cluster
Config Version: 2
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Mon Dec 27 08:07:02 2021
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000002
Ring ID: 1.1c9
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.76.1.0
0x00000002 1 10.76.1.1 (local)
root@pve01:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
link/ether d8:bb:c1:42:6a:65 brd ff:ff:ff:ff:ff:ff
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether d8:bb:c1:42:6a:65 brd ff:ff:ff:ff:ff:ff
inet 10.76.1.1/16 scope global vmbr0
valid_lft forever preferred_lft forever
inet6 fe80::dabb:c1ff:fe42:6a65/64 scope link
valid_lft forever preferred_lft forever
root@pve01:~# systemctl status corosync pve-cluster pveproxy pvedaemon pvestatd
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2021-12-26 23:23:36 CST; 8h ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1152 (corosync)
Tasks: 9 (limit: 38274)
Memory: 1.1G
CPU: 3min 7.379s
CGroup: /system.slice/corosync.service
└─1152 /usr/sbin/corosync -f
Dec 27 08:01:07 pve01 corosync[1152]: [QUORUM] Sync joined[1]: 1
Dec 27 08:01:07 pve01 corosync[1152]: [TOTEM ] A new membership (1.1c9) was formed. Members joined: 1
Dec 27 08:01:07 pve01 corosync[1152]: [QUORUM] This node is within the primary component and will provide service.
Dec 27 08:01:07 pve01 corosync[1152]: [QUORUM] Members[2]: 1 2
Dec 27 08:01:07 pve01 corosync[1152]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 27 08:04:29 pve01 corosync[1152]: [KNET ] link: host: 1 link: 0 is down
Dec 27 08:04:29 pve01 corosync[1152]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Dec 27 08:04:29 pve01 corosync[1152]: [KNET ] host: host: 1 has no active links
Dec 27 08:04:30 pve01 corosync[1152]: [KNET ] rx: host: 1 link: 0 is up
Dec 27 08:04:30 pve01 corosync[1152]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-12-27 00:24:23 CST; 7h ago
Process: 8691 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 8693 (pmxcfs)
Tasks: 10 (limit: 38274)
Memory: 22.1M
CPU: 3.387s
CGroup: /system.slice/pve-cluster.service
└─8693 /usr/bin/pmxcfs
Dec 27 08:01:07 pve01 pmxcfs[8693]: [status] notice: node lost quorum
Dec 27 08:01:07 pve01 pmxcfs[8693]: [status] notice: members: 2/8693
Dec 27 08:01:07 pve01 pmxcfs[8693]: [status] notice: all data is up to date
Dec 27 08:01:07 pve01 pmxcfs[8693]: [dcdb] notice: members: 1/1040, 2/8693
Dec 27 08:01:07 pve01 pmxcfs[8693]: [dcdb] notice: starting data syncronisation
Dec 27 08:01:07 pve01 pmxcfs[8693]: [status] notice: members: 1/1040, 2/8693
Dec 27 08:01:07 pve01 pmxcfs[8693]: [status] notice: starting data syncronisation
Dec 27 08:01:07 pve01 pmxcfs[8693]: [status] notice: node has quorum
Dec 27 08:01:07 pve01 pmxcfs[8693]: [dcdb] notice: received sync request (epoch 1/1040/00000060)
Dec 27 08:01:07 pve01 pmxcfs[8693]: [status] notice: received sync request (epoch 1/1040/0000005B)
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2021-12-26 23:23:37 CST; 8h ago
Process: 1208 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=1/FAILURE)
Process: 1210 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 1212 (pveproxy)
Tasks: 4 (limit: 38274)
Memory: 133.1M
CPU: 1min 17.870s
CGroup: /system.slice/pveproxy.service
├─ 1212 pveproxy
├─61173 pveproxy worker
├─61174 pveproxy worker
└─61175 pveproxy worker
Dec 27 08:01:12 pve01 pveproxy[59734]: worker exit
Dec 27 08:01:12 pve01 pveproxy[59732]: worker exit
Dec 27 08:01:12 pve01 pveproxy[59735]: worker exit
Dec 27 08:01:12 pve01 pveproxy[1212]: worker 59734 finished
Dec 27 08:01:12 pve01 pveproxy[1212]: worker 59732 finished
Dec 27 08:01:12 pve01 pveproxy[1212]: worker 59735 finished
Dec 27 08:01:12 pve01 pveproxy[1212]: starting 3 worker(s)
Dec 27 08:01:12 pve01 pveproxy[1212]: worker 61173 started
Dec 27 08:01:12 pve01 pveproxy[1212]: worker 61174 started
Dec 27 08:01:12 pve01 pveproxy[1212]: worker 61175 started
● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2021-12-26 23:23:36 CST; 8h ago
Process: 1174 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 1203 (pvedaemon)
Tasks: 4 (limit: 38274)
Memory: 136.3M
CPU: 1.675s
CGroup: /system.slice/pvedaemon.service
├─1203 pvedaemon
├─1204 pvedaemon worker
├─1205 pvedaemon worker
└─1206 pvedaemon worker
Dec 26 23:23:36 pve01 systemd[1]: Starting PVE API Daemon...
Dec 26 23:23:36 pve01 pvedaemon[1203]: starting server
Dec 26 23:23:36 pve01 pvedaemon[1203]: starting 3 worker(s)
Dec 26 23:23:36 pve01 pvedaemon[1203]: worker 1204 started
Dec 26 23:23:36 pve01 pvedaemon[1203]: worker 1205 started
Dec 26 23:23:36 pve01 pvedaemon[1203]: worker 1206 started
Dec 26 23:23:36 pve01 systemd[1]: Started PVE API Daemon.
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2021-12-26 23:23:36 CST; 8h ago
Process: 1156 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 1178 (pvestatd)
Tasks: 1 (limit: 38274)
Memory: 110.8M
CPU: 773ms
CGroup: /system.slice/pvestatd.service
└─1178 pvestatd
Dec 27 06:03:06 pve01 pvestatd[1178]: authkey rotation error: cfs-lock 'authkey' error: got lock request timeout
Dec 27 06:03:06 pve01 pvestatd[1178]: status update time (1393.048 seconds)
Dec 27 06:27:33 pve01 pvestatd[1178]: authkey rotation error: cfs-lock 'authkey' error: no quorum!
Dec 27 06:27:33 pve01 pvestatd[1178]: status update time (1467.116 seconds)
Dec 27 07:00:54 pve01 pvestatd[1178]: authkey rotation error: cfs-lock 'authkey' error: got lock request timeout
Dec 27 07:00:54 pve01 pvestatd[1178]: status update time (1991.190 seconds)
Dec 27 07:48:16 pve01 pvestatd[1178]: authkey rotation error: cfs-lock 'authkey' error: got lock request timeout
Dec 27 07:48:16 pve01 pvestatd[1178]: status update time (2831.996 seconds)
Dec 27 08:01:07 pve01 pvestatd[1178]: authkey rotation error: cfs-lock 'authkey' error: got lock request timeout
Dec 27 08:01:07 pve01 pvestatd[1178]: status update time (770.310 seconds)
 
I also notice that the joining node doesn't have both nodes in /etc/pve/nodes whereas the first node does
Code:
root@pve00:~# ls -l /etc/pve/nodes/
total 0
drwxr-xr-x 2 root www-data 0 Dec 26 21:25 pve00
drwxr-xr-x 2 root www-data 0 Dec 26 21:58 pve01
Code:
root@pve01:~# ls -l /etc/pve/nodes/
total 0
drwxr-xr-x 2 root www-data 0 Dec 26 23:15 pve01
 
I have a nearly identical problem when adding a new node to my cluster via the GUI.

The new node doesn't join the existing cluster properly and the "nodes" directory doesn't get fully populated on existing cluster nodes. I also get a GUI error on the existing cluster of "pve-ssl.pem does not exist!" related to the new/failed node.

Meanwhile, the new node which failed to join the cluster no longer has an accessible web interface and reports "Activity blocked" in pvecm status.

I have tried the join from a fresh install multiple times. I have ensured all nodes are the same version (7.4-x) and are up to date via the same sources.list configuration. I also confirmed that I can SSH between all nodes before attempting the join.

I was ultimately able to fix the problem by deleting the new node from the existing cluster (pvecm delnode xxx), then clearing the cluster configuration from the new node (by following these steps), then manually joining the new node to the cluster via the CLI (pvecm add x.x.x.x). This threw some errors but worked and even GUI access is restored on the new node.

Code:
root@host:/etc/pve# pvecm add x.x.x.x
Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PVE/CLI/pvecm.pm line 354, <DATA> line 960.
Please enter superuser (root) password for 'x.x.x.x': *************
Establishing API connection with host 'x.x.x.x'
The authenticity of host 'x.x.x.x' can't be established.
X509 SHA256 key fingerprint is XX:XX:XX:XX:XX:XX:XX.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP 'x.x.x.x'
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1701429953.sql.gz'
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'host' to cluster.
 
Last edited:
I have a nearly identical problem. The new node doesn't join the existing cluster properly and the "nodes" directory doesn't get fully populated on existing cluster nodes. I also get a GUI error on the existing cluster of "pve-ssl.pem does not exist!" which is related to the new/failed node.

Meanwhile, the new node which failed to join the cluster no longer has an accessible web interface and reports "Activity blocked" in pvecm status.

I have tried the join from a fresh install multiple times. I have ensured all nodes are the same version (7.4-x) and are up to date via the same sources.list configuration. I also confirmed that I can SSH between all nodes before attempting the join.

The new node likely does not have a quorum. Can you post the same as asked in post #2 above?

The one thing that's worrying is that you tried to join "from fresh install multiple times" ... are you sure those attempts did not get through to the rest of the cluster which now thinks there's bunch nodes out there but all offline and waiting for quorum perpetually?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!