Moving cluster nodes to a different network - missing ':' after key 'interface' (500)

joshbgosh10592 · Jan 7, 2020

So, I was following the directions to separate the cluster network, and things started to seem like they were going alright with the first node.
However, when I rebooted the node, Datacenter view\Cluster says, missing ':' after key 'interface' (500). I then check the /etc/pve/corosync.conf file and noticed I missed a space after bindnetaddress:, so syntax makes it incorrect. So, I edit the file and fix that issue. Now, the corosync.conf files in /etc/pve and /etc/corosync both are identical, but the GUI still shows that error.
pvecm status returns:
Cannot initialize CMAP service.

PVE 5.4-13 (I'll be upgrading the nodes to PVE 6 shortly though)
/etc/pve/corosync.conf:

Code:

root@PVE-1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: PVE-1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.9.220.11
    ring1_addr: 172.16.0.11
  }
  node {
    name: PVE-2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.9.220.12
    ring1_addr: 172.16.0.12
  }
  node {
    name: PVE-Witness
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.9.220.49
    ring1_addr: 172.16.0.49
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: PVECluster
  config_version: 11
  interface {
    bindnetaddr: 10.9.220.11
    ringnumber: 0
  }
  interface [
    bindnetaddr: 172.16.0.12
    ringnumber: 1
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

systemctl status pve-cluster:

Code:

root@PVE-1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset:
   Active: active (running) since Mon 2020-01-06 22:48:07 EST; 2min 48s ago
  Process: 4480 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, stat
  Process: 4312 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 4373 (pmxcfs)
    Tasks: 6 (limit: 7372)
   Memory: 53.6M
      CPU: 1.139s
   CGroup: /system.slice/pve-cluster.service
           └─4373 /usr/bin/pmxcfs

Jan 06 22:50:42 PVE-1 pmxcfs[4373]: [dcdb] crit: cpg_initialize failed: 2
Jan 06 22:50:42 PVE-1 pmxcfs[4373]: [status] crit: cpg_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [quorum] crit: quorum_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [confdb] crit: cmap_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [dcdb] crit: cpg_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [status] crit: cpg_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [quorum] crit: quorum_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [confdb] crit: cmap_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [dcdb] crit: cpg_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [status] crit: cpg_initialize failed: 2

I'm not sure what's going on...
I couldn't have fired the cluster up, because the IP address of node1 is now the network's gateway..
I should also note that I added ring1 during this, as I only originally had ring0 set up.

aaron · Jan 7, 2020

Have you tried to stop the pve-cluster service, then restart the corosync service before you start pve-cluster again?

joshbgosh10592 said:
I couldn't have fired the cluster up, because the IP address of node1 is now the network's gateway..

What do you mean with that? Can you give us more information?

joshbgosh10592 · Jan 7, 2020

aaron said:
Have you tried to stop the pve-cluster service, then restart the corosync service before you start pve-cluster again?

What do you mean with that? Can you give us more information?

I'll give that a shot today, thank you. I only tried rebooting,

As far as the IP address of node1 being the networks gateway:
I just split the network from being one massive /16 network into multiple smaller /24 networks. PVE-1 had an IP address of .1, which is now that subnet's default gateway. I mentioned that because most people commenting about doing this say to keep them on the same network so they can sync during the switch over. That wasn't possible for me.

joshbgosh10592 · Jan 8, 2020

aaron said:
Have you tried to stop the pve-cluster service, then restart the corosync service before you start pve-cluster again?

ran:systemctl stop pve-cluster, systemctl stop corosync, then systemctl start corosync
I received:

Code:

Last login: Mon Jan  6 22:50:45 2020 from 10.9.127.66
root@PVE-1:~# systemctl stop pve-cluster
root@PVE-1:~# systemctl stop corosync
root@PVE-1:~# systemctl start corosync
Job for corosync.service failed because the control process exited with error code.
See "systemctl status corosync.service" and "journalctl -xe" for details.
root@PVE-1:~# journalctl -xe
Jan 07 17:54:11 PVE-1 corosync[213862]: parser error: Unexpected closing brace
Jan 07 17:54:11 PVE-1 systemd[1]: corosync.service: Main process exited, code=ex
Jan 07 17:54:11 PVE-1 systemd[1]: Failed to start Corosync Cluster Engine.
-- Subject: Unit corosync.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit corosync.service has failed.
--
-- The result is failed.
Jan 07 17:54:11 PVE-1 systemd[1]: corosync.service: Unit entered failed state.
Jan 07 17:54:11 PVE-1 systemd[1]: corosync.service: Failed with result 'exit-cod
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[1] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[2] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[3] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[4] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: status update error: Connection refused
Jan 07 17:54:19 PVE-1 pve-firewall[4685]: status update error: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[1] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[2] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[3] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[4] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: status update error: Connection refused
lines 1544-1566/1566 (END)

At this point, I was unable to open the webUI (not sure if I could today before I stopped the services), Chrome meets me with an error saying,
10.9.220.11 is currently unable to handle this request. HTTP501 error
So I rebooted, which restored the webUI, but I'm right back where I was with the missing ':' after key 'interface' (500)
I try your recommendation again (because maybe the server was tired after millions of failed errors overnight)
Same thing - Can't start corosync service. journalctl -xe:

Code:

Jan 07 18:13:37 PVE-1 pveproxy[4604]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:37 PVE-1 pveproxy[4604]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:37 PVE-1 pveproxy[4604]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:41 PVE-1 pve-firewall[4535]: status update error: Connection refused
Jan 07 18:13:41 PVE-1 pve-firewall[4535]: firewall update time (10.026 seconds)
Jan 07 18:13:41 PVE-1 pve-firewall[4535]: status update error: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[4] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: status update error: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: status update time (10.032 seconds)
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[4] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: status update error: Connection refused
Jan 07 18:13:51 PVE-1 pve-firewall[4535]: status update error: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[4] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: status update error: Connection refused

aaron · Jan 8, 2020

So your situation as that node1 does not work properly because it's IP address is also the gateway address for the other nodes?
Do you have a gateway configured? If so you will have the same IP used on two different devices.

Does the cluster work on the other nodes?

Can you post the output of pvecm status of all nodes and the /etc/network/interfaces files so we can get a better overview?

joshbgosh10592 · Jan 16, 2020

I actually ended up doing a backup of the VMs I wanted and reinstalling. I previously blew up a Ceph install, as well as a service that doesn't work with the new version of Proxmox (iDRAC Service Module), so I figured it was better to have a clean slate anyway.
Thank you though!

Search

Search

Moving cluster nodes to a different network - missing ':' after key 'interface' (500)

joshbgosh10592

New Member

aaron

Proxmox Staff Member

joshbgosh10592

New Member

joshbgosh10592

New Member

aaron

Proxmox Staff Member

joshbgosh10592

New Member