Moving cluster nodes to a different network - missing ':' after key 'interface' (500)

joshbgosh10592

New Member
Jan 29, 2019
25
1
1
So, I was following the directions to separate the cluster network, and things started to seem like they were going alright with the first node.
However, when I rebooted the node, Datacenter view\Cluster says, missing ':' after key 'interface' (500). I then check the /etc/pve/corosync.conf file and noticed I missed a space after bindnetaddress:, so syntax makes it incorrect. So, I edit the file and fix that issue. Now, the corosync.conf files in /etc/pve and /etc/corosync both are identical, but the GUI still shows that error.
pvecm status returns:
Cannot initialize CMAP service.

PVE 5.4-13 (I'll be upgrading the nodes to PVE 6 shortly though)
/etc/pve/corosync.conf:
Code:
root@PVE-1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: PVE-1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.9.220.11
    ring1_addr: 172.16.0.11
  }
  node {
    name: PVE-2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.9.220.12
    ring1_addr: 172.16.0.12
  }
  node {
    name: PVE-Witness
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.9.220.49
    ring1_addr: 172.16.0.49
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: PVECluster
  config_version: 11
  interface {
    bindnetaddr: 10.9.220.11
    ringnumber: 0
  }
  interface [
    bindnetaddr: 172.16.0.12
    ringnumber: 1
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

systemctl status pve-cluster:
Code:
root@PVE-1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset:
   Active: active (running) since Mon 2020-01-06 22:48:07 EST; 2min 48s ago
  Process: 4480 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, stat
  Process: 4312 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 4373 (pmxcfs)
    Tasks: 6 (limit: 7372)
   Memory: 53.6M
      CPU: 1.139s
   CGroup: /system.slice/pve-cluster.service
           └─4373 /usr/bin/pmxcfs

Jan 06 22:50:42 PVE-1 pmxcfs[4373]: [dcdb] crit: cpg_initialize failed: 2
Jan 06 22:50:42 PVE-1 pmxcfs[4373]: [status] crit: cpg_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [quorum] crit: quorum_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [confdb] crit: cmap_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [dcdb] crit: cpg_initialize failed: 2
Jan 06 22:50:48 PVE-1 pmxcfs[4373]: [status] crit: cpg_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [quorum] crit: quorum_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [confdb] crit: cmap_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [dcdb] crit: cpg_initialize failed: 2
Jan 06 22:50:54 PVE-1 pmxcfs[4373]: [status] crit: cpg_initialize failed: 2

I'm not sure what's going on...
I couldn't have fired the cluster up, because the IP address of node1 is now the network's gateway..
I should also note that I added ring1 during this, as I only originally had ring0 set up.
 
Last edited:
Have you tried to stop the pve-cluster service, then restart the corosync service before you start pve-cluster again?

I couldn't have fired the cluster up, because the IP address of node1 is now the network's gateway..
What do you mean with that? Can you give us more information?
 
Have you tried to stop the pve-cluster service, then restart the corosync service before you start pve-cluster again?


What do you mean with that? Can you give us more information?
I'll give that a shot today, thank you. I only tried rebooting,

As far as the IP address of node1 being the networks gateway:
I just split the network from being one massive /16 network into multiple smaller /24 networks. PVE-1 had an IP address of .1, which is now that subnet's default gateway. I mentioned that because most people commenting about doing this say to keep them on the same network so they can sync during the switch over. That wasn't possible for me.
 
Have you tried to stop the pve-cluster service, then restart the corosync service before you start pve-cluster again?
ran:systemctl stop pve-cluster, systemctl stop corosync, then systemctl start corosync
I received:
Code:
Last login: Mon Jan  6 22:50:45 2020 from 10.9.127.66
root@PVE-1:~# systemctl stop pve-cluster
root@PVE-1:~# systemctl stop corosync
root@PVE-1:~# systemctl start corosync
Job for corosync.service failed because the control process exited with error code.
See "systemctl status corosync.service" and "journalctl -xe" for details.
root@PVE-1:~# journalctl -xe
Jan 07 17:54:11 PVE-1 corosync[213862]: parser error: Unexpected closing brace
Jan 07 17:54:11 PVE-1 systemd[1]: corosync.service: Main process exited, code=ex
Jan 07 17:54:11 PVE-1 systemd[1]: Failed to start Corosync Cluster Engine.
-- Subject: Unit corosync.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit corosync.service has failed.
--
-- The result is failed.
Jan 07 17:54:11 PVE-1 systemd[1]: corosync.service: Unit entered failed state.
Jan 07 17:54:11 PVE-1 systemd[1]: corosync.service: Failed with result 'exit-cod
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[1] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[2] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[3] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: ipcc_send_rec[4] failed: Connection refuse
Jan 07 17:54:12 PVE-1 pvestatd[4689]: status update error: Connection refused
Jan 07 17:54:19 PVE-1 pve-firewall[4685]: status update error: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[1] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[2] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[3] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: ipcc_send_rec[4] failed: Connection refuse
Jan 07 17:54:22 PVE-1 pvestatd[4689]: status update error: Connection refused
lines 1544-1566/1566 (END)

At this point, I was unable to open the webUI (not sure if I could today before I stopped the services), Chrome meets me with an error saying,
10.9.220.11 is currently unable to handle this request. HTTP501 error
So I rebooted, which restored the webUI, but I'm right back where I was with the missing ':' after key 'interface' (500)
I try your recommendation again (because maybe the server was tired after millions of failed errors overnight)
Same thing - Can't start corosync service. journalctl -xe:
Code:
Jan 07 18:13:37 PVE-1 pveproxy[4604]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:37 PVE-1 pveproxy[4604]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:37 PVE-1 pveproxy[4604]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:41 PVE-1 pve-firewall[4535]: status update error: Connection refused
Jan 07 18:13:41 PVE-1 pve-firewall[4535]: firewall update time (10.026 seconds)
Jan 07 18:13:41 PVE-1 pve-firewall[4535]: status update error: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[4] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: status update error: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: status update time (10.032 seconds)
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: ipcc_send_rec[4] failed: Connection refused
Jan 07 18:13:41 PVE-1 pvestatd[4528]: status update error: Connection refused
Jan 07 18:13:51 PVE-1 pve-firewall[4535]: status update error: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[1] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[2] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[3] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: ipcc_send_rec[4] failed: Connection refused
Jan 07 18:13:51 PVE-1 pvestatd[4528]: status update error: Connection refused
 
Last edited:
So your situation as that node1 does not work properly because it's IP address is also the gateway address for the other nodes?
Do you have a gateway configured? If so you will have the same IP used on two different devices.

Does the cluster work on the other nodes?

Can you post the output of pvecm status of all nodes and the /etc/network/interfaces files so we can get a better overview?
 
I actually ended up doing a backup of the VMs I wanted and reinstalling. I previously blew up a Ceph install, as well as a service that doesn't work with the new version of Proxmox (iDRAC Service Module), so I figured it was better to have a clean slate anyway.
Thank you though!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!