Hi all;
I've run into an issue trying to add a new cluster node. The new cluster node, when added to the cluster, fails to populate /etc/pve correctly with the necessary information and ends up with pve-ssl.key errors (file not found, etc), and does not join the cluster successfully. The new node has been reinstalled several times.
The cluster is made of 3 nodes right now.
I believe the issue may have something to do with the fact that there was a nodeid 4 in the cluster at one point, however that is no longer the case and there is no reference to nodeid 4 that I can find in the files on the main / master node. As far as I can recall, I went through and completed the entire pvecm delnode 4 thing.
I've double checked the /etc/corosync/corosync.conf and /etc/pve/corosync.conf, as well as /etc/pve/nodes, and only the existing nodes are in those files, so there is no reference to the failed new node join attempts.
All nodes have the correct hostnames set and /etc/hosts is valid on the new host.
What files would you like me to reference and how can I get this solved?
Edit: This last time I did a pvecm add from the new node, it got up to 'waiting for quorum' and the primary node started showing these log entries:
And the cluster inevitably broke. After I shut the new node down I went in and deleted the /etc/pve/nodes/ directory for the new node and edited the corosync.conf again, restarted pve-cluster on the existing 3 nodes, and the cluster is back.
Thanks!
I've run into an issue trying to add a new cluster node. The new cluster node, when added to the cluster, fails to populate /etc/pve correctly with the necessary information and ends up with pve-ssl.key errors (file not found, etc), and does not join the cluster successfully. The new node has been reinstalled several times.
The cluster is made of 3 nodes right now.
I believe the issue may have something to do with the fact that there was a nodeid 4 in the cluster at one point, however that is no longer the case and there is no reference to nodeid 4 that I can find in the files on the main / master node. As far as I can recall, I went through and completed the entire pvecm delnode 4 thing.
I've double checked the /etc/corosync/corosync.conf and /etc/pve/corosync.conf, as well as /etc/pve/nodes, and only the existing nodes are in those files, so there is no reference to the failed new node join attempts.
All nodes have the correct hostnames set and /etc/hosts is valid on the new host.
What files would you like me to reference and how can I get this solved?
Edit: This last time I did a pvecm add from the new node, it got up to 'waiting for quorum' and the primary node started showing these log entries:
Code:
Jul 13 14:26:19 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 70
Jul 13 14:26:19 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 10
Jul 13 14:26:19 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:19 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 20
Jul 13 14:26:20 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 80
Jul 13 14:26:20 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 20
Jul 13 14:26:20 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:20 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 30
Jul 13 14:26:21 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 90
Jul 13 14:26:21 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 30
Jul 13 14:26:21 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:21 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 40
Jul 13 14:26:22 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 100
Jul 13 14:26:22 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retried 100 times
Jul 13 14:26:22 proxalpha pmxcfs[1323458]: [dcdb] crit: cpg_send_message failed: 6
Jul 13 14:26:22 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 40
Jul 13 14:26:22 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:22 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 50
Jul 13 14:26:23 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:23 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 10
Jul 13 14:26:23 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 50
Jul 13 14:26:23 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:23 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 60
Jul 13 14:26:24 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 20
Jul 13 14:26:24 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 60
Jul 13 14:26:24 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:24 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 70
Jul 13 14:26:25 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 30
Jul 13 14:26:25 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 70
Jul 13 14:26:25 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:25 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 80
Jul 13 14:26:26 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 40
Jul 13 14:26:26 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 80
Jul 13 14:26:26 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:26 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 90
Jul 13 14:26:27 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 50
Jul 13 14:26:27 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:27 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 90
Jul 13 14:26:27 proxalpha corosync[1236324]: [TOTEM ] Retransmit List: a b c
Jul 13 14:26:27 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retry 100
Jul 13 14:26:27 proxalpha pmxcfs[1323458]: [status] notice: cpg_send_message retried 100 times
Jul 13 14:26:27 proxalpha pmxcfs[1323458]: [status] crit: cpg_send_message failed: 6
Jul 13 14:26:28 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 60
Jul 13 14:26:28 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retry 100
Jul 13 14:26:28 proxalpha pmxcfs[1323458]: [dcdb] notice: cpg_send_message retried 100 times
Jul 13 14:26:28 proxalpha pmxcfs[1323458]: [dcdb] crit: failed to send SYNC_START message
And the cluster inevitably broke. After I shut the new node down I went in and deleted the /etc/pve/nodes/ directory for the new node and edited the corosync.conf again, restarted pve-cluster on the existing 3 nodes, and the cluster is back.
Thanks!
Last edited: