Redundant Ring Protocal ( RRP ) is not working

Mobin · Dec 13, 2019

I have created a five node cluster and configurations are as follows

bond0 created using two physical interfaces on each node
bond1 created using two physical interfaces on each node
Created Linux bridge as follows

Create vmbr0 using bond0 and vmbr1 using bond1. Each node primary ip ( cluster ip ) is through vmbr0
Created vmbr2 using bond0 and now each virtual bridge interfaces are having dedicated ip's.

Edited corosync.conf as per the RRP documentation, ie added ring 0 and ring1 on corosync.conf and restarted network & corosync

For testing, I have made down vmbr0 on one of the node but that node went offline ( it didn't consider the ring1 mentioned on the corosyn.conf ). Anyone can help me to point out the issue

Stefan_R · Dec 18, 2019

Could you post your /etc/pve/corosync.conf and /etc/network/interfaces files?

Mobin · Dec 19, 2019

Dear Stefan,

Thanks for considering this port.
Kindly find the below details and let me know what to change in redundant ring configuration

cat /media/issdc/Storage/proxmox_rring
root@E3B1:~# cat /etc/pve/corosync.conf
totem {
cluster_name: WebApp-StagClus
config_version: 12
ip_version: ipv4
secauth: on
rrp_mode: passive
version: 2
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindaddress: 172.x.z.11
ringnumber: 1
}
}
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: E3B1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.y.11
ring1_addr: 172.x.z.11
}
node {
name: E3B2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.y.12
ring1_addr: 172.x.z.12
}
node {
name: E3B3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.y.13
ring1_addr: 172.x.z.13
}
node {
name: E4B1
nodeid: 4
quorum_votes: 1
ring0_addr: 172.x.y.21
ring1_addr: 172.x.z.21
}
node {
name: E4B2
nodeid: 5
quorum_votes: 1
ring0_addr: 172.x.y.22
ring1_addr: 172.x.z.22
}
}

quorum {
provider: corosync_votequorum
}
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage part of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno49 inet manual

iface ens2f0 inet manual

iface ens2f1 inet manual

iface ens2f2 inet manual

iface ens2f3 inet manual

iface eno50 inet manual

auto bond0
iface bond0 inet manual
slaves eno49 eno50
bond_miimon 100
bond_mode active-backup
0bond_miimon 100

auto bond1
iface bond1 inet manual
slaves ens2f0 ens2f1
bond_miimon 100
bond_mode active-backup
#1 Gig bonded interface

iface bond0.y inet manual

iface bond0.z manual

auto vmbr0
iface vmbr0 inet static
address 172.x.y.11
netmask 255.255.255.0
gateway 172.x.y.7
bridge_ports bond0.y
bridge_stp off
bridge_fd 0
bridge_vlan_aware yes

auto vmbr1
iface vmbr1 inet static
address 172.x.a.11
netmask 255.255.255.0
bridge_ports bond1
bridge_stp off
bridge_fd 0
#1 Gig

auto vmbr2
iface vmbr2 inet static
address 172.x.z.11
netmask 255.255.255.0
bridge_ports bond0.z
bridge_vlan_aware yes
bridge_stp off
bridge_fd 0

root@E3B1:~#

Stefan_R · Dec 19, 2019

Mobin said:
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindaddress: 172.x.z.11
ringnumber: 1
}

bindnetaddr vs bindaddress here? That shouldn't work at all?

Check your 'journalctl -e' after saving the config (and incrementing the config_version). Also, just to make sure, which version of PVE/corosync are you using? ('pveversion -v')

Mobin · Dec 20, 2019

root@E3B1:~# cat /etc/pve/corosync.conf
totem {
cluster_name: WebApp-StagClus
config_version: 12
ip_version: ipv4
secauth: on
rrp_mode: passive
version: 2
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindnetaddr: 172.x.z.11
ringnumber: 1
}
}
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: E3B1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.y.11
ring1_addr: 172.x.z.11
}
node {
name: E3B2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.y.12
ring1_addr: 172.x.z.12
}
node {
name: E3B3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.y.13
ring1_addr: 172.x.z.13
}
node {
name: E4B1
nodeid: 4
quorum_votes: 1
ring0_addr: 172.x.y.21
ring1_addr: 172.x.z.21
}
node {
name: E4B2
nodeid: 5
quorum_votes: 1
ring0_addr: 172.x.y.22
ring1_addr: 172.x.z.22
}
}

quorum {
provider: corosync_votequorum
}
root@E3B1:~# cat /etc/corosync/corosync.conf
totem {
cluster_name: WebApp-StagClus
config_version: 12
ip_version: ipv4
secauth: on
rrp_mode: passive
version: 2
interface {
bindnetaddr: 172.x.y.11
ringnumber: 0
}
interface {
bindnetaddr: 172.x.z.11
ringnumber: 1
}
}
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: E3B1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.y.11
ring1_addr: 172.x.z.11
}
node {
name: E3B2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.y.12
ring1_addr: 172.x.z.12
}
node {
name: E3B3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.y.13
ring1_addr: 172.x.z.13
}
node {
name: E4B1
nodeid: 4
quorum_votes: 1
ring0_addr: 172.x.y.21
ring1_addr: 172.x.z.21
}
node {
name: E4B2
nodeid: 5
quorum_votes: 1
ring0_addr: 172.x.y.22
ring1_addr: 172.x.z.22
}
}

quorum {
provider: corosync_votequorum
}
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# journalctl -e
Dec 20 07:53:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:53:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:53:50 E3B1 sshd[36991]: Connection closed by 172.x.y.241 port 39418 [preauth]
Dec 20 07:54:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:54:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:54:50 E3B1 sshd[37119]: Connection closed by 172.x.y.241 port 40160 [preauth]
Dec 20 07:55:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:55:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:55:10 E3B1 pveproxy[2648]: worker 34227 finished
Dec 20 07:55:10 E3B1 pveproxy[2648]: starting 1 worker(s)
Dec 20 07:55:10 E3B1 pveproxy[2648]: worker 37180 started
Dec 20 07:55:12 E3B1 pveproxy[37179]: got inotify poll request in wrong process - disabling inotify
Dec 20 07:55:48 E3B1 pvedaemon[35832]: <root@pam> successful auth for user 'root@pam'
Dec 20 07:55:50 E3B1 sshd[37251]: Connection closed by 172.x.y.241 port 40916 [preauth]
Dec 20 07:56:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:56:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:56:50 E3B1 sshd[37382]: Connection closed by 172.x.y.241 port 41664 [preauth]
Dec 20 07:57:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:57:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:57:23 E3B1 pvedaemon[32818]: <root@pam> end task UPID:E3B1:00008ABC:2497E2DB:5DFC7B5E:vncproxy:108:root@pam: OK
Dec 20 07:57:23 E3B1 pvedaemon[37458]: starting vnc proxy UPID:E3B1:00009252:249942DB:5DFC7EE3:vncproxy:108:root@pam:
Dec 20 07:57:23 E3B1 pvedaemon[27264]: <root@pam> starting task UPID:E3B1:00009252:249942DB:5DFC7EE3:vncproxy:108:root@pam:
Dec 20 07:57:24 E3B1 pveproxy[37179]: worker exit
Dec 20 07:57:50 E3B1 sshd[37513]: Connection closed by 172.x.y.241 port 42410 [preauth]
Dec 20 07:58:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:58:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:58:50 E3B1 sshd[37641]: Connection closed by 172.x.y.241 port 43166 [preauth]
Dec 20 07:59:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 07:59:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 07:59:50 E3B1 sshd[37769]: Connection closed by 172.x.y.241 port 43928 [preauth]
Dec 20 08:00:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:00:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:00:50 E3B1 sshd[37902]: Connection closed by 172.x.y.241 port 44664 [preauth]
Dec 20 08:01:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:01:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:01:50 E3B1 sshd[38032]: Connection closed by 172.x.y.241 port 45406 [preauth]
Dec 20 08:02:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:02:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:02:50 E3B1 sshd[38160]: Connection closed by 172.x.y.241 port 46106 [preauth]
Dec 20 08:03:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:03:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:03:50 E3B1 sshd[38291]: Connection closed by 172.x.y.241 port 46818 [preauth]
Dec 20 08:04:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:04:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
Dec 20 08:04:01 E3B1 pvedaemon[27264]: <root@pam> end task UPID:E3B1:00009252:249942DB:5DFC7EE3:vncproxy:108:root@pam: OK
Dec 20 08:04:08 E3B1 pvedaemon[27264]: <root@pam> update VM 108: -net0 e1000=CE:79:18:F4:A2:77,bridge=vmbr0,link_down=1
Dec 20 08:04:50 E3B1 sshd[38426]: Connection closed by 172.x.y.241 port 47520 [preauth]
Dec 20 08:04:52 E3B1 pvedaemon[27264]: <root@pam> update VM 108: -net0 e1000=CE:79:18:F4:A2:77,bridge=vmbr0
Dec 20 08:04:55 E3B1 sshd[38428]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=172.x.68.103 user=root
Dec 20 08:04:57 E3B1 sshd[38428]: Failed password for root from 172.x.68.103 port 37840 ssh2
Dec 20 08:04:57 E3B1 sshd[38428]: Connection closed by 172.x.68.103 port 37840 [preauth]
Dec 20 08:04:58 E3B1 sshd[38431]: Accepted password for root from 172.x.68.103 port 37842 ssh2
Dec 20 08:04:58 E3B1 sshd[38431]: pam_unix(sshd:session): session opened for user root by (uid=0)
Dec 20 08:04:58 E3B1 systemd[1]: Created slice User Slice of root.
Dec 20 08:04:58 E3B1 systemd[1]: Starting User Manager for UID 0...
Dec 20 08:04:58 E3B1 systemd-logind[1418]: New session 1915 of user root.
Dec 20 08:04:58 E3B1 systemd[1]: Started Session 1915 of user root.
Dec 20 08:04:58 E3B1 systemd[38438]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent and passphrase cache.
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent (access for web browsers).
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Dec 20 08:04:58 E3B1 systemd[38438]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Sockets.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Paths.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Timers.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Basic System.
Dec 20 08:04:58 E3B1 systemd[38438]: Reached target Default.
Dec 20 08:04:58 E3B1 systemd[38438]: Startup finished in 11ms.
Dec 20 08:04:58 E3B1 systemd[1]: Started User Manager for UID 0.
Dec 20 08:05:00 E3B1 systemd[1]: Starting Proxmox VE replication runner...
Dec 20 08:05:01 E3B1 systemd[1]: Started Proxmox VE replication runner.
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# pveversion
pve-manager/5.2-1/0fcd7879 (running kernel: 4.15.17-1-pve)
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#
root@E3B1:~# corosync -v
Corosync Cluster Engine, version '2.4.2-dirty'
Copyright (c) 2006-2009 Red Hat, Inc.
root@E3B1:~#
root@E3B1:~#
root@E3B1:~#

Mobin · Dec 20, 2019

changed network configuration but not it is not working even after the change.

When we are making the main interface down the node is going out of the cluster

czechsys · Dec 20, 2019

corosync-cfgtool -s

Mobin · Dec 20, 2019

root@E3B1:~# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 172.x.y.11
status = ring 0 active with no faults
root@E3B1:~#

czechsys · Dec 20, 2019

As the output show, there is no ring1.
What PVE version? Because i have "link_mode" and not "rrp_mode" on 6.1

Code:

totem {
 cluster_name: some_cluster
  config_version: some_number
  interface {
   linknumber: 0
  }
  interface {
   linknumber: 1
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Mobin · Dec 23, 2019

pve version is 5.2-1 kernel 4.15.17-1

fabian · Dec 23, 2019

what does 'systemctl status corosync' and 'journalctl -b -u corosync | tail -n 200' output? is '/etc/pve/corosync.conf' and '/etc/corosync/corosync.conf' identical on all cluster nodes?

Mobin · Jan 7, 2020

Dear Fabin,

I have verified /etc/pve/corosync.conf' and '/etc/corosync/corosync.conf' on all nodes and it is similar. There is no difference,

The above outputs I have attached here

Search

Search

Redundant Ring Protocal ( RRP ) is not working

Mobin

Member

Stefan_R

Proxmox Retired Staff

Mobin

Member

Stefan_R

Proxmox Retired Staff

Mobin

Member

Mobin

Member

czechsys

Renowned Member

Mobin

Member

czechsys

Renowned Member

Mobin

Member

fabian

Proxmox Staff Member

Mobin

Member