Corosync config update failed.

Dec 8, 2022
61
4
8
Yesterday I set out to update my corosync config to add a third NIC. That third NIC per machine is to be the main corosync interface. Here's my original config before updating:
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.2.8
    ring1_addr: 192.168.1.22
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.2.7
    ring1_addr: 192.168.1.250
  }
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.2.6
    ring1_addr: 192.168.1.8
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Pantheon
  config_version: 4
  interface {
    linknumber: 0
    knet_link_priority: 2
  }
  interface {
    linknumber: 1
    knet_link_priority: 1
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

I updated the code as follows:

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.3.8
    ring1_addr: 192.168.2.8
    ring2_addr: 192.168.1.22
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.3.7
    ring1_addr: 192.168.2.7
    ring2_addr: 192.168.1.250
  }
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.3.6
    ring1_addr: 192.168.2.6
    ring2_addr: 192.168.1.8
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Pantheon
  config_version: 5
  interface {
    linknumber: 0
    knet_link_priority: 3
  }
  interface {
    linknumber: 1
    knet_link_priority: 2
  }
  interface {
    linknumber: 2
    knet_link_priority: 1
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

I got the following error in my logs:

Code:
Oct 25 19:45:47 zeus pmxcfs[4154]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 5)
Oct 25 19:45:48 zeus corosync[4257]:   [CFG   ] Config reload requested by node 1
Oct 25 19:45:48 zeus corosync[4257]:   [TOTEM ] new config has different address for link 0 (addr changed from 192.168.2.8 to 192.168.3.8). Internal value was NOT changed.
Oct 25 19:45:48 zeus corosync[4257]:   [TOTEM ] new config has different address for link 0 (addr changed from 192.168.2.7 to 192.168.3.7). Internal value was NOT changed.
Oct 25 19:45:48 zeus corosync[4257]:   [TOTEM ] new config has different address for link 0 (addr changed from 192.168.2.6 to 192.168.3.6). Internal value was NOT changed.
Oct 25 19:45:48 zeus corosync[4257]:   [TOTEM ] new config has different address for link 1 (addr changed from 192.168.1.22 to 192.168.2.8). Internal value was NOT changed.
Oct 25 19:45:48 zeus corosync[4257]:   [TOTEM ] new config has different address for link 1 (addr changed from 192.168.1.250 to 192.168.2.7). Internal value was NOT changed.
Oct 25 19:45:48 zeus corosync[4257]:   [TOTEM ] new config has different address for link 1 (addr changed from 192.168.1.8 to 192.168.2.6). Internal value was NOT changed.
Oct 25 19:45:48 zeus corosync[4257]:   [CFG   ] Cannot configure new interface definitions: To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
Oct 25 19:45:48 zeus corosync[4257]:   [KNET  ] pmtud: MTU manually set to: 0
Oct 25 19:45:48 zeus pmxcfs[4154]: [dcdb] crit: corosync-cfgtool -R failed with exit code 7#010

What did I do wrong here? I ended up restoring from the corosync backup which I then learned I still had to update config_version to 6 else it didn't work once I rebooted a node to test. Now it is all working as it was with just the two NICs, but trying to figure out my mistake before tackling this again.
 
Found this thread here that seems to give me some clues, but it's not quite my scenario exactly as I want to reorder to ring priorities as well. https://forum.proxmox.com/threads/corosync-redundancy-corosync-conf.109369/

Wondering if this would work if I do my config as follows. Note that in this new setup, ring2 is the new NIC that I want to be the primary link so I gave it the higher knet_link_priority. Would this function as I expect and deploy without error?
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.2.8
    ring1_addr: 192.168.1.22
    ring2_addr: 192.168.3.8
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.2.7
    ring1_addr: 192.168.1.250
    ring2_addr: 192.168.3.7
  }
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.2.6
    ring1_addr: 192.168.1.8
    ring2_addr: 192.168.3.6
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Pantheon
  config_version: 7
  interface {
    linknumber: 0
    knet_link_priority: 2
  }
  interface {
    linknumber: 1
    knet_link_priority: 1
  }
  interface {
    linknumber: 2
    knet_link_priority: 3
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
This is because you have changed the subnets for each link.

The correct way was to do it, 1 subnet move 3 times. (you always need 1 subnet working)


to fix it : the most simple is to copy manually

cp /etc/pve/corosync.conf /etc/corosync/corosync.conf on each node, and restart corosync service on each node.

(be carefull to not have HA vms enabled)
 
This is because you have changed the subnets for each link.

The correct way was to do it, 1 subnet move 3 times. (you always need 1 subnet working)


to fix it : the most simple is to copy manually

cp /etc/pve/corosync.conf /etc/corosync/corosync.conf on each node, and restart corosync service on each node.

(be carefull to not have HA vms enabled)

So is the "correct" way for me to, starting from my current config:

1. Remove ring0. Write the change.
2. Add the new ring0, remove ring1. Write the change.
3. Add the new ring1 and ring2. Write the change.
4. Done?
 
So is the "correct" way for me to, starting from my current config:

1. Remove ring0. Write the change.
2. Add the new ring0, remove ring1. Write the change.
3. Add the new ring1 and ring2. Write the change.
4. Done?
no, you always need 1 working network to replicate config. (step 2 will fail)


I think:
1. move ring1 (192.168.1.x) to ring2
2. move ring0 (192.168.2.X) to ring1
3. add new ( 192.168.3.x) to ring0
 
no, you always need 1 working network to replicate config. (step 2 will fail)


I think:
1. move ring1 (192.168.1.x) to ring2
2. move ring0 (192.168.2.X) to ring1
3. add new ( 192.168.3.x) to ring0
I understand what you're saying. That makes sense why it wouldn't work. Thank you.
 
For anyone who may stumble onto this in the future, I just want to confirm that the key was to slowly add/remove the connections. I personally went about it as follows:

1. Removed ring0.
2. Added new ring0.
3. Removed ring1.
4. Added new ring1 and ring2.

Obviously increased the config number each step. Now all three NICs are up and running.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!