All node's cluster are fenced at the same time

djahida

Active Member
Mar 24, 2015
44
0
26
Hi


I need some help please


I configure a cluster HA Proxmox 4.3 ( 3 nodes, 2 physical machines and the third one was a Virtual Box machine).



This week-end I had the 2 physical machine rebooted at the same time. What could be the reason?


IGMP Snooping is enabled in all switches


And I have tested the multicast with omping, it's working



1. The only thing that I've noticed is that the time wasn't synchronized on the three nodes


My question: Time desynchronization can be the reason to have the two physical nodes rebooting at the same time?



2. My second question is about Multicast. Is there another way to check it?



Below the log files



node 1 (192.168.xxxx.x) - which is rebooted


Mar 4 05:59:10 xxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.


Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.xxxx.x:97832) was formed. Members left: 3


Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [TOTEM ] Failed to receive the leave message. failed: 3


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation


Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [QUORUM] Members[2]: 1 2


Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000020)


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001C)


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received all states


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: leader is 1/3318


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: synced members: 1/3318, 2/3318


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: start sending inode updates


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: sent all (0) updates


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: all data is up to date


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: dfsm_deliver_queue: queue length 2


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: received all states


Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: all data is up to date


Mar 4 05:59:13 xxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.xxxx.x:97836) was formed. Members joined: 3


Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318, 3/1070


Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation


Mar 4 05:59:13 xxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3


Mar 4 05:59:13 xxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.


Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times


Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318, 3/1070


Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation


Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000021)


Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001D)


Mar 4 05:59:16 xxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.


Mar 4 05:59:17 xxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.xxxx.x:97840) was formed. Members


Mar 4 05:59:17 xxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3


Mar 4 05:59:17 xxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.


Mar 4 06:00:01 xxxxxxxxxxx CRON[180817]: (root) CMD (/usr/local/bin/mount.sh /var/log/mount.log)


Mar 4 06:00:09 xxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates


Mar 4 06:00:09 xxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates


Mar 4 06:04:50 xxxxxxxxxxx rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="3140" x-info="http://www.rsyslog.com"] start


Mar 4 06:04:50 xxxxxxxxxxx systemd-modules-load[1317]: Module 'fuse' is builtin


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted POSIX Message Queue File System.


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted Debug File System.


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted Huge Pages File System.


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Started udev Coldplug all Devices.


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Starting udev Wait for Complete Device Initialization...


Mar 4 06:04:50 xxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_devintf'


Mar 4 06:04:50 xxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_poweroff'


Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Initializing cgroup subsys cpuset


Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Initializing cgroup subsys cpu


Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Initializing cgroup subsys cpuacct


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Started Create Static Device Nodes in /dev.


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Starting udev Kernel Device Manager...


Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Linux version 4.4.19-1-pve (root@nora) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Wed Sep 14 14:33:50 CEST 2016 ()


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Started udev Kernel Device Manager.


Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.19-1-pve root=/dev/mapper/pve-root ro quiet


Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] KERNEL supported cpus:


Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted FUSE Control File System.


Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Intel GenuineIntel



node 2 (192.168.xxxx.y) - which is rebooted



Mar 4 05:58:53 yyyyyyyyyy corosync[3512]: [TOTEM ] A processor failed, forming new configuration.


Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [TOTEM ] A new membership (192.168.xxxx.xx:97832) was formed. Members left: 3


Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [TOTEM ] Failed to receive the leave message. failed: 3


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: starting data syncronisation


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: starting data syncronisation


Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [QUORUM] Members[2]: 1 2


Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [MAIN ] Completed service synchronization, ready to provide service.


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000020)


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001C)


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: received all states


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: leader is 1/3318


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: synced members: 1/3318, 2/3318


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: all data is up to date


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: dfsm_deliver_queue: queue length 2


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: received all states


Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: all data is up to date


Mar 4 05:58:56 yyyyyyyyyy corosync[3512]: [TOTEM ] A new membership (192.168.xxxx.xx:97836) was formed. Members joined: 3


Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318, 3/1070


Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: starting data syncronisation


Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318, 3/1070


Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [status] notice: starting data syncronisation


Mar 4 05:58:56 yyyyyyyyyy corosync[3512]: [QUORUM] Members[3]: 1 2 3


Mar 4 05:58:56 yyyyyyyyyy corosync[3512]: [MAIN ] Completed service synchronization, ready to provide service.


Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000021)


Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001D)


Mar 4 05:59:00 yyyyyyyyyy corosync[3512]: [TOTEM ] A new membership (192.168.xxxx.xxx:97840) was formed. Members


Mar 4 05:59:00 yyyyyyyyyy corosync[3512]: [QUORUM] Members[3]: 1 2 3


Mar 4 05:59:00 yyyyyyyyyy corosync[3512]: [MAIN ] Completed service synchronization, ready to provide service.



Mar 4 06:04:28 yyyyyyyyyy systemd-modules-load[1298]: Module 'fuse' is builtin


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted Huge Pages File System.


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted Debug File System.


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted POSIX Message Queue File System.


Mar 4 06:04:28 yyyyyyyyyy systemd-modules-load[1298]: Inserted module 'ipmi_devintf'


Mar 4 06:04:28 yyyyyyyyyy systemd-modules-load[1298]: Inserted module 'ipmi_poweroff'


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Started udev Coldplug all Devices.


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Starting udev Wait for Complete Device Initialization...


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Started Create Static Device Nodes in /dev.


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Starting udev Kernel Device Manager...


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Started udev Kernel Device Manager.


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Starting LSB: Tune IDE hard disks...


Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted FUSE Control File System.


Mar 4 06:04:28 yyyyyyyyyy kernel: [ 0.000000] Initializing cgroup subsys cpuset


Mar 4 06:04:28 yyyyyyyyyy kernel: [ 0.000000] Initializing cgroup subsys cpu


Mar 4 06:04:28 yyyyyyyyyy kernel: [ 0.000000] Initializing cgroup subsys cpuacct



node 3 (192.168.xxxx.z) which is not rebooted



Mar 4 05:57:39 pve-quorum rrdcached[1018]: flushing old values


Mar 4 05:57:39 pve-quorum rrdcached[1018]: rotating journals


Mar 4 05:57:39 pve-quorum rrdcached[1018]: started new journal /var/lib/rrdcached/journal/rrd.journal.1488603459.171214


Mar 4 05:57:39 pve-quorum rrdcached[1018]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1488596259.172788


Mar 4 05:57:58 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 05:58:10 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 05:59:07 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 05:59:53 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 06:04:44 pve-quorum pmxcfs[1070]: [dcdb] notice: data verification successful


Mar 4 06:08:14 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 06:09:10 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 06:09:57 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 06:13:02 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 06:17:01 pve-quorum CRON[7898]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)


Mar 4 06:18:16 pve-quorum pmxcfs[1070]: [status] notice: received log


Mar 4 06:19:13 pve-quorum pmxcfs[1070]: [status] notice: received log



Thanks
 
Last edited:
Such a general reboot could be caused by the switch being down for a short time, causing all nodes to selffence themselves.
 
  • Like
Reactions: djahida
In my case I have two switches, each physical node is connected to one switch.
If it was a network failure on the switch, why the third node, which is a Virtual Box connected to the same switch as one the physical nodes, is not rebooted. I don’t know, but for me it can be a failure network on the switch
And how about the multicast? Or the desynchronization of time?
 
Thanks Manu

In the file log I don’t find any log about corosync[]: [TOTEM ] Retransmit List:

And nothing about Lost quorum


I have juste


Mar 4 05:59:10 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.y.xxx:97832) was formed. Members left: 3

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [TOTEM ] Failed to receive the leave message. failed: 3

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [QUORUM] Members[2]: 1 2

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000020)

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001C)

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received all states

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: leader is 1/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: synced members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: start sending inode updates

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: sent all (0) updates

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: all data is up to date

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: dfsm_deliver_queue: queue length 2

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: received all states

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: all data is up to date

Mar 4 05:59:13 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.y.xxx:97836) was formed. Members joined: 3

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:59:13 xxxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:59:13 xxxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000021)

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001D)

Mar 4 05:59:16 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.

Mar 4 05:59:17 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.y.xxx:97840) was formed. Members

Mar 4 05:59:17 xxxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:59:17 xxxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 06:00:01 xxxxxxxxxxxx CRON[180817]: (root) CMD (/usr/local/bin/mount.sh /var/log/mount.log)

Mar 4 06:00:09 xxxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates

Mar 4 06:00:09 xxxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates

Mar 4 06:04:50 xxxxxxxxxxxx rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="3140" x-info="http://www.rsyslog.com"] start

Mar 4 06:04:50 xxxxxxxxxxxx systemd-modules-load[1317]: Module 'fuse' is builtin

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Mounted POSIX Message Queue File System.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Mounted Debug File System.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Mounted Huge Pages File System.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Started udev Coldplug all Devices.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Starting udev Wait for Complete Device Initialization...

Mar 4 06:04:50 xxxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_devintf'

Mar 4 06:04:50 xxxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_poweroff'
 
Last edited:
By default, how long the cluster takes to declare a node as failing?

In which configuration file can I find the option that allows to increase this time?
 
If a node cannot communicate with other cluster nodes AND has ha ressources it will reboot itself after 60 seconds.
This paramater is not configurable.

DId you do the omping test for 10 minutes, as explained in the pve wiki ?
Looking in at the number of messages [TOTEM ] A processor failed, forming new configuration, Corosync has communication problesm, with most of the times point to multicast problems.
 
  • Like
Reactions: djahida
I do this omping -c 600 -i 1 -q 192.168.xx.x 192.168.xx.y 192.168.xx.z in each node and I got this result


First node 192.168.xx.x

192.168.xx.y : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.091/0.151/0.257/0.024

192.168.xx.y : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.103/0.158/0.260/0.025

192.168.xx.z : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.247/2.012/65.861/4.663

192.168.xx.z : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.464/2.591/90.534/5.699

Second node 192.168.xx.y

192.168.xx.x : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.087/0.150/0.246/0.025

192.168.xx.x : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.093/0.160/0.257/0.027

192.168.xx.z : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.320/2.397/83.187/5.902

192.168.xx.z : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.547/2.932/83.573/6.476

Third node 192.168.xx.z

192.168.xx.x : unicast, xmt/rcv/%loss = 599/599/0%, min/avg/max/std-dev = 0.161/0.809/23.102/1.372

192.168.xx.x : multicast, xmt/rcv/%loss = 599/599/0%, min/avg/max/std-dev = 0.168/0.831/23.119/1.376

192.168.xx.y : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.132/0.532/22.955/1.237

192.168.xx.y : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.134/0.544/22.958/1.242
 
is the result of omping correctt in my configuration?
I still have the error corosync[3487]: [TOTEM ] A processor failed, forming new configuration. in the log file
 
Last edited:
Hi


I really need some help, please



I think it's a multicast problem, but when I check it with omping I have 0% of lost data !!!



So, I have a question about multicast configuration on network devices


I made echo 1 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier



then I have the logs bellow and no node can see the other one



Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238 31239 3123a 3123b 3123c 3123d 3123e 3123f


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238 31239 3123a 3123b 3123c 3123d 3123e 3123f


Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238 31239 3123a 3123b 3123c 3123d 3123e 3123f



So I made echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier



Can someone explain the reason to have lost quorum when I enable multicast on network devices
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!