Problem joining cluster

bferrell · May 21, 2020

I tried to add a 5th node to my cluster, and it appears in the running cluster but as offline. I see the correct corosync config on the cluster, but when I look at the new node it did not get an updated corosyn.conf file (in /etc/pve, it did in /etc/corosync), and it will not let me login to the UI (though I can ssh in). I tried the restart commands from this thread to no effect. I see errors in starting corosync below. Should I remove it from the cluster and try again?

Code:

root@svr-00:/var/run# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2020-05-21 16:29:30 EDT; 806ms ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
  Process: 5515 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=8)
 Main PID: 5515 (code=exited, status=8)

May 21 16:29:30 svr-00 systemd[1]: Starting Corosync Cluster Engine...
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Corosync Cluster Engine 3.0.3 starting up
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp p
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Please migrate config file to nodelist.
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] parse error in config: No multicast port specified
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1386.
May 21 16:29:30 svr-00 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
May 21 16:29:30 svr-00 systemd[1]: corosync.service: Failed with result 'exit-code'.
May 21 16:29:30 svr-00 systemd[1]: Failed to start Corosync Cluster Engine.

bferrell · May 21, 2020

I think my issue is that I joined the cluster with the command line (it wouldn't let me pick the network interface in the UI, because it kept defaulting them to CIDR), and I only entered the ring0 information maybe. I missed the part about needing to enter all 3 rings in the command.

And, if I were to delete it from the cluster, based on my corosync.conf above, would this be the correct syntax?

add current_cluster_addr new_host link addresses

Code:

pvecm add 192.168.100.11                       -link0 192.168.100.10 -link1 192.168.101.10 -link2 0corosync

bferrell · May 22, 2020

So, I deleted the node and reinstalled. Trying to join from the UI, and I think this is a bug. If I enter the JOIN information, the drop down for Link0 has the 3 network interfaces with CIDR notation only (I can't edit them), but when you pick one it complains that it doesn't look like an IP address.

bferrell · May 23, 2020

So, I'm a bit of a debian novice, so it's probably my fault, but I couldn't get my new node into the cluster (after several tries of deleting and reinstalling) but to remove ring1 and ring2 config from the running cluster, add via the UI, and then add the ring1/2 configs back in. I think there might be a bug here, but maybe it's just my lack of imagination/understanding.

Search

Search

Problem joining cluster

bferrell

Well-Known Member

Attachments

bferrell

Well-Known Member

bferrell

Well-Known Member

Attachments

bferrell

Well-Known Member

We value your privacy