Problem joining cluster

bferrell

Well-Known Member
Nov 16, 2018
99
2
48
54
I tried to add a 5th node to my cluster, and it appears in the running cluster but as offline. I see the correct corosync config on the cluster, but when I look at the new node it did not get an updated corosyn.conf file (in /etc/pve, it did in /etc/corosync), and it will not let me login to the UI (though I can ssh in). I tried the restart commands from this thread to no effect. I see errors in starting corosync below. Should I remove it from the cluster and try again?

Code:
root@svr-00:/var/run# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2020-05-21 16:29:30 EDT; 806ms ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
  Process: 5515 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=8)
 Main PID: 5515 (code=exited, status=8)

May 21 16:29:30 svr-00 systemd[1]: Starting Corosync Cluster Engine...
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Corosync Cluster Engine 3.0.3 starting up
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp p
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Please migrate config file to nodelist.
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] parse error in config: No multicast port specified
May 21 16:29:30 svr-00 corosync[5515]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1386.
May 21 16:29:30 svr-00 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
May 21 16:29:30 svr-00 systemd[1]: corosync.service: Failed with result 'exit-code'.
May 21 16:29:30 svr-00 systemd[1]: Failed to start Corosync Cluster Engine.
 

Attachments

  • corosync.conf.txt
    1 KB · Views: 1
  • hosts.txt
    746 bytes · Views: 0
  • interfaces.txt
    1 KB · Views: 0
I think my issue is that I joined the cluster with the command line (it wouldn't let me pick the network interface in the UI, because it kept defaulting them to CIDR), and I only entered the ring0 information maybe. I missed the part about needing to enter all 3 rings in the command.

And, if I were to delete it from the cluster, based on my corosync.conf above, would this be the correct syntax?


add current_cluster_addr new_host link addresses
Code:
pvecm add 192.168.100.11                       -link0 192.168.100.10 -link1 192.168.101.10 -link2 0corosync
 
Last edited:
So, I deleted the node and reinstalled. Trying to join from the UI, and I think this is a bug. If I enter the JOIN information, the drop down for Link0 has the 3 network interfaces with CIDR notation only (I can't edit them), but when you pick one it complains that it doesn't look like an IP address.
 

Attachments

  • CIDR_only.jpg
    CIDR_only.jpg
    87.6 KB · Views: 4
  • error.jpg
    error.jpg
    44.3 KB · Views: 4
So, I'm a bit of a debian novice, so it's probably my fault, but I couldn't get my new node into the cluster (after several tries of deleting and reinstalling) but to remove ring1 and ring2 config from the running cluster, add via the UI, and then add the ring1/2 configs back in. I think there might be a bug here, but maybe it's just my lack of imagination/understanding.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!