[SOLVED] Problems adding new nodes to a cluster

MasterTH · Feb 22, 2025

Hi,

i have a proxmox-cluster setup with currently 5 nodes in it.
i'm using this cluster since pve6 and had multiple joins and deletion in it already. Everything was fine till now.
i'm switching to a new datacenter and have to move the vms from the machines that are running in the old to the new dc, i'd love to move them via live-migration thats why i decided to buy new hardware and install them in the new dc and then move the vms.
but now when i'm trying to add the nodes to the cluster i'm struggling. Tried it multiple times with each node and everytime it stops in a point where i cannot find any error.

here is the log of one of the joins:

Code:

Feb 22 11:34:01 root25 corosync[7875]:   [QUORUM] Members[6]: 1 3 4 5 6 7
Feb 22 11:34:01 root25 corosync[7875]:   [MAIN  ] Completed service synchronization, ready to provide service.
Feb 22 11:34:01 root25 pmxcfs[7880]: [status] notice: cpg_send_message retried 1 times
Feb 22 11:34:01 root25 pmxcfs[7880]: [status] notice: dfsm_deliver_queue: queue length 3
Feb 22 11:34:01 root25 pmxcfs[7880]: [status] notice: members: 1/1269, 3/1201, 4/2706682, 5/1481356, 6/2876296, 7/7880
Feb 22 11:34:01 root25 pmxcfs[7880]: [status] notice: starting data syncronisation
Feb 22 11:34:01 root25 pmxcfs[7880]: [dcdb] notice: cpg_send_message retried 1 times
Feb 22 11:34:01 root25 pmxcfs[7880]: [dcdb] notice: received sync request (epoch 1/1269/00000018)
Feb 22 11:34:01 root25 pmxcfs[7880]: [dcdb] notice: received sync request (epoch 1/1269/00000019)
Feb 22 11:34:01 root25 pmxcfs[7880]: [status] notice: received sync request (epoch 1/1269/00000018)
Feb 22 11:34:01 root25 pmxcfs[7880]: [status] notice: received sync request (epoch 1/1269/00000019)
Feb 22 11:34:01 root25 corosync[7875]:   [TOTEM ] Retransmit List: 3c 3f
Feb 22 11:34:05 root25 corosync[7875]:   [KNET  ] link: host: 1 link: 0 is down
Feb 22 11:34:05 root25 corosync[7875]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 22 11:34:05 root25 corosync[7875]:   [KNET  ] host: host: 1 has no active links
Feb 22 11:34:06 root25 pve-ha-lrm[1676]: loop take too long (150 seconds)
Feb 22 11:34:06 root25 pve-ha-lrm[1676]: unable to write lrm status file - unable to open file '/etc/pve/nodes/root25/lrm_status.tmp.1676' - No such file or directory
Feb 22 11:34:08 root25 corosync[7875]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Feb 22 11:34:08 root25 corosync[7875]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 22 11:34:08 root25 corosync[7875]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Feb 22 11:34:11 root25 pve-ha-lrm[1676]: unable to write lrm status file - unable to open file '/etc/pve/nodes/root25/lrm_status.tmp.1676' - No such file or directory
Feb 22 11:34:15 root25 corosync[7875]:   [TOTEM ] Retransmit List: 98
Feb 22 11:34:16 root25 pve-ha-lrm[1676]: unable to write lrm status file - unable to open file '/etc/pve/nodes/root25/lrm_status.tmp.1676' - No such file or directory
Feb 22 11:34:21 root25 pve-ha-lrm[1676]: unable to write lrm status file - unable to open file '/etc/pve/nodes/root25/lrm_status.tmp.1676' - No such file or directory
Feb 22 11:34:26 root25 pve-ha-lrm[1676]: unable to write lrm status file - unable to open file '/etc/pve/nodes/root25/lrm_status.tmp.1676' - No such file or directory

when i look into /etc/pve it is pretty empty:

Code:

root@root25:~# ls -l /etc/pve/
total 1
-rw-r----- 1 root www-data 923 Feb 22 11:31 corosync.conf
lrwxr-xr-x 1 root www-data   0 Jan  1  1970 local -> nodes/root25
lrwxr-xr-x 1 root www-data   0 Jan  1  1970 lxc -> nodes/root25/lxc
lrwxr-xr-x 1 root www-data   0 Jan  1  1970 openvz -> nodes/root25/openvz
drwx------ 2 root www-data   0 Feb 22 11:31 priv
lrwxr-xr-x 1 root www-data   0 Jan  1  1970 qemu-server -> nodes/root25/qemu-server

what else can i do?
Multicast between the DCs are enabled it is network over a darkfiber-connection with 10g. it has nothing to do with the network, because i have another cluster with the same constellation and it worked out just fine.

any help appreciated
thank you

MasterTH · Feb 22, 2025

after cleanup the node the status looks like this

Code:

root@root19:/etc/pve/nodes# pvecm status
Cluster information
-------------------
Name:             pve6
Config Version:   19
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Feb 22 11:52:43 2025
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000003
Ring ID:          1.33899
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   6
Highest expected: 6
Total votes:      5
Quorum:           4
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.30.17
0x00000003          1 192.168.30.19 (local)
0x00000004          1 192.168.30.20
0x00000005          1 192.168.30.21
0x00000006          1 192.168.30.22

MasterTH · Feb 22, 2025

found this error in the logs

MasterTH · Feb 23, 2025

i switched my plans and built a new cluster from scratch and move the vms via backup/restore

Search

Search

[SOLVED] Problems adding new nodes to a cluster

MasterTH

Renowned Member

MasterTH

Renowned Member

MasterTH

Renowned Member

Attachments

MasterTH

Renowned Member

We value your privacy