Hi,
i have crazy corosync behavior - it doesn't mark specific interface failed, if i set it down.
Setup:
1] eth3 - physical, primary corosync ring0, connected to cisco 3560g
2] pve_01_coro1 - openvswitch, secondary corosync ring1, connected to huawei S6720
Both 1] & 2] up:
Case A: ip link set eth3 down:
Case B: ip link set pve_01_coro1 down:
Case C: ip link set down eth3 && ip link set down pve_01_coro1:
Case C has more different statuses, depends on which interface goes down first etc. Anyway, there is something crazy with corosync, do i have some config error/flaw in setup? Can be problem that both rings uses same address on different switches? Do i need in such setup define mcast address in corosync.conf? Nothing about this in proxmox wiki.
Thanks.
i have crazy corosync behavior - it doesn't mark specific interface failed, if i set it down.
proxmox-ve: 4.4-86 (running kernel: 4.4.49-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-97
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
openvswitch-switch: 2.6.0-2
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-97
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
openvswitch-switch: 2.6.0-2
auto eth3
iface eth3 inet static
address 10.0.40.41
netmask 255.255.255.0
#pve-02 node coro0
allow-vmbr1 bond1
iface bond1 inet manual
ovs_bridge vmbr1
ovs_type OVSBond
ovs_bonds eth4 eth5
ovs_options lacp=active bond-mode=balance-tcp
pre-up ( /sbin/ip link set eth4 mtu 9000 && /sbin/ip link set eth5 mtu 9000 )
up /sbin/ip link set mtu 9000 bond1
auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports bond1 pve_01_nfs pve_01_coro1
mtu 9000
allow-vmbr1 pve_01_coro1
iface pve_01_coro1 inet static
address 10.0.50.41
netmask 255.255.255.0
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=143
mtu 9000
#pve-01 node coro1
iface eth3 inet static
address 10.0.40.41
netmask 255.255.255.0
#pve-02 node coro0
allow-vmbr1 bond1
iface bond1 inet manual
ovs_bridge vmbr1
ovs_type OVSBond
ovs_bonds eth4 eth5
ovs_options lacp=active bond-mode=balance-tcp
pre-up ( /sbin/ip link set eth4 mtu 9000 && /sbin/ip link set eth5 mtu 9000 )
up /sbin/ip link set mtu 9000 bond1
auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports bond1 pve_01_nfs pve_01_coro1
mtu 9000
allow-vmbr1 pve_01_coro1
iface pve_01_coro1 inet static
address 10.0.50.41
netmask 255.255.255.0
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=143
mtu 9000
#pve-01 node coro1
totem {
version: 2
secauth: on
cluster_name: pve-0
config_version: 2
ip_version: ipv4
rrp_mode: passive
interface {
ringnumber: 0
bindnetaddr: 10.0.40.41
}
interface {
ringnumber: 1
bindnetaddr: 10.0.50.41
}
}
nodelist {
node {
ring0_addr: pve-01-coro0
ring1_addr: pve-01-coro1
name: pve-01
nodeid: 1
quorum_votes: 1
}
}
quorum {
provider: corosync_votequorum
}
logging {
to_syslog: yes
debug: off
}
version: 2
secauth: on
cluster_name: pve-0
config_version: 2
ip_version: ipv4
rrp_mode: passive
interface {
ringnumber: 0
bindnetaddr: 10.0.40.41
}
interface {
ringnumber: 1
bindnetaddr: 10.0.50.41
}
}
nodelist {
node {
ring0_addr: pve-01-coro0
ring1_addr: pve-01-coro1
name: pve-01
nodeid: 1
quorum_votes: 1
}
}
quorum {
provider: corosync_votequorum
}
logging {
to_syslog: yes
debug: off
}
10.0.40.42 : joined (S,G) = (*, 232.43.211.234), pinging
10.0.40.42 : unicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.122/0.154/0.177/0.023
10.0.40.42 : multicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.129/0.163/0.195/0.025
10.0.50.42 : joined (S,G) = (*, 232.43.211.234), pinging
10.0.50.42 : unicast, xmt/rcv/%loss = 68/68/0%, min/avg/max/std-dev = 0.187/0.217/0.278/0.025
10.0.50.42 : multicast, xmt/rcv/%loss = 68/68/0%, min/avg/max/std-dev = 0.167/0.234/0.466/0.035
10.0.40.42 : unicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.122/0.154/0.177/0.023
10.0.40.42 : multicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.129/0.163/0.195/0.025
10.0.50.42 : joined (S,G) = (*, 232.43.211.234), pinging
10.0.50.42 : unicast, xmt/rcv/%loss = 68/68/0%, min/avg/max/std-dev = 0.187/0.217/0.278/0.025
10.0.50.42 : multicast, xmt/rcv/%loss = 68/68/0%, min/avg/max/std-dev = 0.167/0.234/0.466/0.035
Setup:
1] eth3 - physical, primary corosync ring0, connected to cisco 3560g
2] pve_01_coro1 - openvswitch, secondary corosync ring1, connected to huawei S6720
Both 1] & 2] up:
Code:
Local node ID 1
RING ID 0
id = 10.0.40.41
status = ring 0 active with no faults
RING ID 1
id = 10.0.50.41
status = ring 1 active with no faults
Case A: ip link set eth3 down:
Code:
RING ID 0
id = 127.0.0.1
status = ring 0 active with no faults
RING ID 1
id = 10.0.50.41
status = ring 1 active with no faults
Case B: ip link set pve_01_coro1 down:
Code:
RING ID 0
id = 10.0.40.41
status = ring 0 active with no faults
RING ID 1
id = 10.0.50.41
status = Marking ringid 1 interface 127.0.0.1 FAULTY
Case C: ip link set down eth3 && ip link set down pve_01_coro1:
Code:
RING ID 0
id = 127.0.0.1
status = Marking ringid 0 interface 127.0.0.1 FAULTY
RING ID 1
id = 10.0.50.41
status = ring 1 active with no faults
Case C has more different statuses, depends on which interface goes down first etc. Anyway, there is something crazy with corosync, do i have some config error/flaw in setup? Can be problem that both rings uses same address on different switches? Do i need in such setup define mcast address in corosync.conf? Nothing about this in proxmox wiki.
Thanks.