Corosync Redundancy not working

Aug 10, 2021
6
0
1
42
Hello!
I have a problem with corosync redundancy on a 3 node test cluster running PVE 7.1-4. I set everything up and it looks fine but as soon as I pull the cable of ring0 the connection to the cluster is lost. The strange thing is that if I take down the port via
Code:
# ip link set dev eno1 down
everything is working as expected.

Code:
# corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0 udp
    addr    = xxx.xxx.xxx.10
    status:
        nodeid:          1:    localhost
        nodeid:          2:    connected
        nodeid:          3:    connected
LINK ID 1 udp
    addr    = xxx.xxx.xxx.11
    status:
        nodeid:          1:    localhost
        nodeid:          2:    connected
        nodeid:          3:    connected

Where is my mistake?

Code:
# cat /etc/pve/corosync.conf 
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: proxmox-2
    nodeid: 3
    quorum_votes: 1
    ring0_addr: xxx.xxx.xxx.22
    ring1_addr: xxx.xxx.xxx.23
  }
  node {
    name: pve-1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: xxx.xxx.xxx.10
    ring1_addr: xxx.xxx.xxx.11
  }
  node {
    name: pve-2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: xxx.xxx.xxx.12
    ring1_addr: xxx.xxx.xxx.13
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Test
  config_version: 3
  interface {
    linknumber: 0
    knet_link_priority: 1
  }
  interface {
    linknumber: 1
    knet_link_priority: 10
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


Thanks a lot!
 
what do you mean with 'connection to the cluster is lost' - please post any errors/logs that are relevant. also, the /etc/network/interfaces from each node would be good to include..
 
Hello Fabian,

thank you for your answer. If I pull the cable from node-2, ring0 than the node is isolated from 1 and 3.



Nov 17 15:58:34 proxmox-2 corosync[2951]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 17 15:58:35 proxmox-2 corosync[2951]: [KNET ] rx: host: 1 link: 0 is up
Nov 17 15:58:35 proxmox-2 corosync[2951]: [KNET ] host: host: 1 (passive) best link: 1 (pri: 10)
Nov 17 16:00:11 proxmox-2 corosync[2951]: [KNET ] link: host: 2 link: 0 is down
Nov 17 16:00:11 proxmox-2 corosync[2951]: [KNET ] link: host: 2 link: 1 is down
Nov 17 16:00:11 proxmox-2 corosync[2951]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 10)
Nov 17 16:00:11 proxmox-2 corosync[2951]: [KNET ] host: host: 2 has no active links
Nov 17 16:00:11 proxmox-2 corosync[2951]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 10)
Nov 17 16:00:11 proxmox-2 corosync[2951]: [KNET ] host: host: 2 has no active links
Nov 17 16:00:12 proxmox-2 corosync[2951]: [TOTEM ] Token has not been received in 2737 ms
Nov 17 16:00:13 proxmox-2 corosync[2951]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Nov 17 16:00:17 proxmox-2 corosync[2951]: [QUORUM] Sync members[2]: 1 3
Nov 17 16:00:17 proxmox-2 corosync[2951]: [QUORUM] Sync left[1]: 2
Nov 17 16:00:17 proxmox-2 corosync[2951]: [TOTEM ] A new membership (1.281) was formed. Members left: 2
Nov 17 16:00:17 proxmox-2 corosync[2951]: [TOTEM ] Failed to receive the leave message. failed: 2
Nov 17 16:00:17 proxmox-2 corosync[2951]: [QUORUM] Members[2]: 1 3
Nov 17 16:00:17 proxmox-2 corosync[2951]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 17 16:01:58 proxmox-2 corosync[2951]: [KNET ] rx: host: 2 link: 0 is up
Nov 17 16:01:58 proxmox-2 corosync[2951]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 17 16:01:58 proxmox-2 corosync[2951]: [KNET ] rx: host: 2 link: 1 is up
Nov 17 16:01:58 proxmox-2 corosync[2951]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 10)
Nov 17 16:01:58 proxmox-2 corosync[2951]: [QUORUM] Sync members[3]: 1 2 3
Nov 17 16:01:58 proxmox-2 corosync[2951]: [QUORUM] Sync joined[1]: 2
Nov 17 16:01:58 proxmox-2 corosync[2951]: [TOTEM ] A new membership (1.285) was formed. Members joined: 2
Nov 17 16:01:58 proxmox-2 corosync[2951]: [QUORUM] Members[3]: 1 2 3
Nov 17 16:01:58 proxmox-2 corosync[2951]: [MAIN ] Completed service synchronization, ready to provide service.
 
it seems pulling that cable makes both links go down.. I suspect some network config mistake ;)
 
The link was still up according to:

# ip link | grep eno

But no ping was going through. On the switch side everything looks fine. I also replaced our data center switch of ring1 for a test but the result was the same.


The network conf looks straightforward to me!?
/etc/network/interfaces

....
auto eno1
iface eno1 inet static
address xxx.xxx.xxx.22/24
#ProxSync

auto eno2
iface eno2 inet static
address xxx.xxx.xxx.23/24
#ProxSync
....
 
is xxx.xxx.xxx the same for both interfaces?
 
well, yeah - you only have a single subnet, and the link where all of that subnet is routed goes down
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!