Redundant Ring Protocal ( RRP ) is not working

Mobin

Member
Mar 15, 2019
31
2
13
35
I have created a five node cluster and configurations are as follows

bond0 created using two physical interfaces on each node
bond1 created using two physical interfaces on each node
Created Linux bridge as follows

Create vmbr0 using bond0 and vmbr1 using bond1. Each node primary ip ( cluster ip ) is through vmbr0
Created vmbr2 using bond0 and now each virtual bridge interfaces are having dedicated ip's.

Edited corosync.conf as per the RRP documentation, ie added ring 0 and ring1 on corosync.conf and restarted network & corosync

For testing, I have made down vmbr0 on one of the node but that node went offline ( it didn't consider the ring1 mentioned on the corosyn.conf ). Anyone can help me to point out the issue
 
Could you post your /etc/pve/corosync.conf and /etc/network/interfaces files?
 
Dear Stefan,

Thanks for considering this port.
Kindly find the below details and let me know what to change in redundant ring configuration

cat /media/issdc/Storage/proxmox_rring
root@E3B1:~# cat /etc/pve/corosync.conf
totem {
cluster_name: WebApp-StagClus
config_version: 12
ip_version: ipv4
secauth: on
rrp_mode: passive
version: 2
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindaddress: 172.x.z.11
ringnumber: 1
}
}
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: E3B1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.y.11
ring1_addr: 172.x.z.11
}
node {
name: E3B2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.y.12
ring1_addr: 172.x.z.12
}
node {
name: E3B3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.y.13
ring1_addr: 172.x.z.13
}
node {
name: E4B1
nodeid: 4
quorum_votes: 1
ring0_addr: 172.x.y.21
ring1_addr: 172.x.z.21
}
node {
name: E4B2
nodeid: 5
quorum_votes: 1
ring0_addr: 172.x.y.22
ring1_addr: 172.x.z.22
}
}

quorum {
provider: corosync_votequorum
}
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage part of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno49 inet manual

iface ens2f0 inet manual

iface ens2f1 inet manual

iface ens2f2 inet manual

iface ens2f3 inet manual

iface eno50 inet manual

auto bond0
iface bond0 inet manual
slaves eno49 eno50
bond_miimon 100
bond_mode active-backup
0bond_miimon 100

auto bond1
iface bond1 inet manual
slaves ens2f0 ens2f1
bond_miimon 100
bond_mode active-backup
#1 Gig bonded interface

iface bond0.y inet manual

iface bond0.z manual

auto vmbr0
iface vmbr0 inet static
address 172.x.y.11
netmask 255.255.255.0
gateway 172.x.y.7
bridge_ports bond0.y
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes

auto vmbr1
iface vmbr1 inet static
address 172.x.a.11
netmask 255.255.255.0
bridge_ports bond1
bridge_stp off
bridge_fd 0
#1 Gig

auto vmbr2
iface vmbr2 inet static
address 172.x.z.11
netmask 255.255.255.0
bridge_ports bond0.z
bridge_vlan_aware yes
bridge_stp off
bridge_fd 0

root@E3B1:~#
 
Last edited by a moderator:
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindaddress: 172.x.z.11
ringnumber: 1
}

bindnetaddr vs bindaddress here? That shouldn't work at all?

Check your 'journalctl -e' after saving the config (and incrementing the config_version). Also, just to make sure, which version of PVE/corosync are you using? ('pveversion -v')
 
Last edited by a moderator:
root@E3B1:~# cat /etc/pve/corosync.conf
totem {
cluster_name: WebApp-StagClus
config_version: 12
ip_version: ipv4
secauth: on
rrp_mode: passive
version: 2
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindnetaddr: 172.x.z.11
ringnumber: 1
}
}
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: E3B1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.y.11
ring1_addr: 172.x.z.11
}
node {
name: E3B2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.y.12
ring1_addr: 172.x.z.12
}
node {
name: E3B3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.y.13
ring1_addr: 172.x.z.13
}
node {
name: E4B1
nodeid: 4
quorum_votes: 1
ring0_addr: 172.x.y.21
ring1_addr: 172.x.z.21
}
node {
name: E4B2
nodeid: 5
quorum_votes: 1
ring0_addr: 172.x.y.22
ring1_addr: 172.x.z.22
}
}

quorum {
provider: corosync_votequorum
}
root@E3B1:~# cat /etc/corosync/corosync.conf
totem {
cluster_name: WebApp-StagClus
config_version: 12
ip_version: ipv4
secauth: on
rrp_mode: passive
version: 2
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindnetaddr: 172.x.z.11
ringnumber: 1
}
}
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: E3B1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.y.11
ring1_addr: 172.x.z.11
}
node {
name: E3B2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.y.12
ring1_addr: 172.x.z.12
}
node {
name: E3B3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.y.13
ring1_addr: 172.x.z.13
}
node {
name: E4B1
nodeid: 4
quorum_votes: 1
ring0_addr: 172.x.y.21
ring1_addr: 172.x.z.21
}
node {
name: E4B2
nodeid: 5
quorum_votes: 1
ring0_addr: 172.x.y.22
ring1_addr: 172.x.z.22
}
}

quorum {
provider: corosync_votequorum
}
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# journalctl -e
Dec 20 07:53:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:53:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:53:50 E3B1 sshd[36991]: Connection closed by 172.x.y.241 port 39418 [preauth]
Dec 20 07:54:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:54:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:54:50 E3B1 sshd[37119]: Connection closed by 172.x.y.241 port 40160 [preauth]
Dec 20 07:55:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:55:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:55:10 E3B1 pveproxy[2648]: worker 34227 finished
Dec 20 07:55:10 E3B1 pveproxy[2648]: starting 1 worker(s)
Dec 20 07:55:10 E3B1 pveproxy[2648]: worker 37180 started
Dec 20 07:55:12 E3B1 pveproxy[37179]: got inotify poll request in wrong process - disabling inotify
Dec 20 07:55:48 E3B1 pvedaemon[35832]: <root@pam> successful auth for user 'root@pam'
Dec 20 07:55:50 E3B1 sshd[37251]: Connection closed by 172.x.y.241 port 40916 [preauth]
Dec 20 07:56:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:56:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:56:50 E3B1 sshd[37382]: Connection closed by 172.x.y.241 port 41664 [preauth]
Dec 20 07:57:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:57:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:57:23 E3B1 pvedaemon[32818]: <root@pam> end task UPID:E3B1:00008ABC:2497E2DB:5DFC7B5E:vncproxy:108:root@pam: OK
Dec 20 07:57:23 E3B1 pvedaemon[37458]: starting vnc proxy UPID:E3B1:00009252:249942DB:5DFC7EE3:vncproxy:108:root@pam:
Dec 20 07:57:23 E3B1 pvedaemon[27264]: <root@pam> starting task UPID:E3B1:00009252:249942DB:5DFC7EE3:vncproxy:108:root@pam:
Dec 20 07:57:24 E3B1 pveproxy[37179]: worker exit
Dec 20 07:57:50 E3B1 sshd[37513]: Connection closed by 172.x.y.241 port 42410 [preauth]
Dec 20 07:58:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:58:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:58:50 E3B1 sshd[37641]: Connection closed by 172.x.y.241 port 43166 [preauth]
Dec 20 07:59:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:59:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:59:50 E3B1 sshd[37769]: Connection closed by 172.x.y.241 port 43928 [preauth]
Dec 20 08:00:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:00:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:00:50 E3B1 sshd[37902]: Connection closed by 172.x.y.241 port 44664 [preauth]
Dec 20 08:01:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:01:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:01:50 E3B1 sshd[38032]: Connection closed by 172.x.y.241 port 45406 [preauth]
Dec 20 08:02:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:02:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:02:50 E3B1 sshd[38160]: Connection closed by 172.x.y.241 port 46106 [preauth]
Dec 20 08:03:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:03:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:03:50 E3B1 sshd[38291]: Connection closed by 172.x.y.241 port 46818 [preauth]
Dec 20 08:04:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:04:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:04:01 E3B1 pvedaemon[27264]: <root@pam> end task UPID:E3B1:00009252:249942DB:5DFC7EE3:vncproxy:108:root@pam: OK
Dec 20 08:04:08 E3B1 pvedaemon[27264]: <root@pam> update VM 108: -net0 e1000=CE:79:18:F4:A2:77,bridge=vmbr0,link_down=1
Dec 20 08:04:50 E3B1 sshd[38426]: Connection closed by 172.x.y.241 port 47520 [preauth]
Dec 20 08:04:52 E3B1 pvedaemon[27264]: <root@pam> update VM 108: -net0 e1000=CE:79:18:F4:A2:77,bridge=vmbr0
Dec 20 08:04:55 E3B1 sshd[38428]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=172.x.68.103 user=root
Dec 20 08:04:57 E3B1 sshd[38428]: Failed password for root from 172.x.68.103 port 37840 ssh2
Dec 20 08:04:57 E3B1 sshd[38428]: Connection closed by 172.x.68.103 port 37840 [preauth]
Dec 20 08:04:58 E3B1 sshd[38431]: Accepted password for root from 172.x.68.103 port 37842 ssh2
Dec 20 08:04:58 E3B1 sshd[38431]: pam_unix(sshd:session): session opened for user root by (uid=0)
Dec 20 08:04:58 E3B1 systemd[1]: Created slice User Slice of root.
Dec 20 08:04:58 E3B1 systemd[1]: Starting User Manager for UID 0...
Dec 20 08:04:58 E3B1 systemd-logind[1418]: New session 1915 of user root.
Dec 20 08:04:58 E3B1 systemd[1]: Started Session 1915 of user root.
Dec 20 08:04:58 E3B1 systemd[38438]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent and passphrase cache.
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent (access for web browsers).
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Sockets.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Paths.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Timers.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Basic System.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Default.
Dec 20 08:04:58 E3B1 systemd[38438]: Startup finished in 11ms.
Dec 20 08:04:58 E3B1 systemd[1]: Started User Manager for UID 0.
Dec 20 08:05:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:05:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# pveversion
pve-manager/5.2-1/0fcd7879 (running kernel: 4.15.17-1-pve)
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# corosync -v
Corosync Cluster Engine, version '2.4.2-dirty'
Copyright (c) 2006-2009 Red Hat, Inc.
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
 
Last edited by a moderator:
changed network configuration but not it is not working even after the change.

When we are making the main interface down the node is going out of the cluster
 
root@E3B1:~# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 172.x.y.11
status = ring 0 active with no faults
root@E3B1:~#
 
Last edited by a moderator:
As the output show, there is no ring1.
What PVE version? Because i have "link_mode" and not "rrp_mode" on 6.1

Code:
totem {
 cluster_name: some_cluster
  config_version: some_number
  interface {
   linknumber: 0
  }
  interface {
   linknumber: 1
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
what does 'systemctl status corosync' and 'journalctl -b -u corosync | tail -n 200' output? is '/etc/pve/corosync.conf' and '/etc/corosync/corosync.conf' identical on all cluster nodes?
 
Dear Fabin,

I have verified /etc/pve/corosync.conf' and '/etc/corosync/corosync.conf' on all nodes and it is similar. There is no difference,

The above outputs I have attached here
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!