All node's cluster are fenced at the same time

djahida · Mar 6, 2017

Hi

I need some help please

I configure a cluster HA Proxmox 4.3 ( 3 nodes, 2 physical machines and the third one was a Virtual Box machine).

This week-end I had the 2 physical machine rebooted at the same time. What could be the reason?

IGMP Snooping is enabled in all switches

And I have tested the multicast with omping, it's working

1. The only thing that I've noticed is that the time wasn't synchronized on the three nodes

My question: Time desynchronization can be the reason to have the two physical nodes rebooting at the same time?

2. My second question is about Multicast. Is there another way to check it?

Below the log files

node 1 (192.168.xxxx.x) - which is rebooted

Mar 4 05:59:10 xxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.

Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.xxxx.x:97832) was formed. Members left: 3

Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [TOTEM ] Failed to receive the leave message. failed: 3

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [QUORUM] Members[2]: 1 2

Mar 4 05:59:12 xxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000020)

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001C)

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received all states

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: leader is 1/3318

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: synced members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: start sending inode updates

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: sent all (0) updates

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: all data is up to date

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: dfsm_deliver_queue: queue length 2

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: received all states

Mar 4 05:59:12 xxxxxxxxxxx pmxcfs[3318]: [status] notice: all data is up to date

Mar 4 05:59:13 xxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.xxxx.x:97836) was formed. Members joined: 3

Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:59:13 xxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:59:13 xxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times

Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000021)

Mar 4 05:59:13 xxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001D)

Mar 4 05:59:16 xxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.

Mar 4 05:59:17 xxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.xxxx.x:97840) was formed. Members

Mar 4 05:59:17 xxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:59:17 xxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 06:00:01 xxxxxxxxxxx CRON[180817]: (root) CMD (/usr/local/bin/mount.sh /var/log/mount.log)

Mar 4 06:00:09 xxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates

Mar 4 06:00:09 xxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates

Mar 4 06:04:50 xxxxxxxxxxx rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="3140" x-info="http://www.rsyslog.com"] start

Mar 4 06:04:50 xxxxxxxxxxx systemd-modules-load[1317]: Module 'fuse' is builtin

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted POSIX Message Queue File System.

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted Debug File System.

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted Huge Pages File System.

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Started udev Coldplug all Devices.

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Starting udev Wait for Complete Device Initialization...

Mar 4 06:04:50 xxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_devintf'

Mar 4 06:04:50 xxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_poweroff'

Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Initializing cgroup subsys cpuset

Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Initializing cgroup subsys cpu

Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Initializing cgroup subsys cpuacct

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Started Create Static Device Nodes in /dev.

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Starting udev Kernel Device Manager...

Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Linux version 4.4.19-1-pve (root@nora) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Wed Sep 14 14:33:50 CEST 2016 ()

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Started udev Kernel Device Manager.

Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.19-1-pve root=/dev/mapper/pve-root ro quiet

Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] KERNEL supported cpus:

Mar 4 06:04:50 xxxxxxxxxxx systemd[1]: Mounted FUSE Control File System.

Mar 4 06:04:50 xxxxxxxxxxx kernel: [ 0.000000] Intel GenuineIntel

node 2 (192.168.xxxx.y) - which is rebooted

Mar 4 05:58:53 yyyyyyyyyy corosync[3512]: [TOTEM ] A processor failed, forming new configuration.

Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [TOTEM ] A new membership (192.168.xxxx.xx:97832) was formed. Members left: 3

Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [TOTEM ] Failed to receive the leave message. failed: 3

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [QUORUM] Members[2]: 1 2

Mar 4 05:58:55 yyyyyyyyyy corosync[3512]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000020)

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001C)

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: received all states

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: leader is 1/3318

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: synced members: 1/3318, 2/3318

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: all data is up to date

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: dfsm_deliver_queue: queue length 2

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: received all states

Mar 4 05:58:55 yyyyyyyyyy pmxcfs[3318]: [status] notice: all data is up to date

Mar 4 05:58:56 yyyyyyyyyy corosync[3512]: [TOTEM ] A new membership (192.168.xxxx.xx:97836) was formed. Members joined: 3

Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:58:56 yyyyyyyyyy corosync[3512]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:58:56 yyyyyyyyyy corosync[3512]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000021)

Mar 4 05:58:56 yyyyyyyyyy pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001D)

Mar 4 05:59:00 yyyyyyyyyy corosync[3512]: [TOTEM ] A new membership (192.168.xxxx.xxx:97840) was formed. Members

Mar 4 05:59:00 yyyyyyyyyy corosync[3512]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:59:00 yyyyyyyyyy corosync[3512]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 06:04:28 yyyyyyyyyy systemd-modules-load[1298]: Module 'fuse' is builtin

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted Huge Pages File System.

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted Debug File System.

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted POSIX Message Queue File System.

Mar 4 06:04:28 yyyyyyyyyy systemd-modules-load[1298]: Inserted module 'ipmi_devintf'

Mar 4 06:04:28 yyyyyyyyyy systemd-modules-load[1298]: Inserted module 'ipmi_poweroff'

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Started udev Coldplug all Devices.

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Starting udev Wait for Complete Device Initialization...

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Started Create Static Device Nodes in /dev.

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Starting udev Kernel Device Manager...

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Started udev Kernel Device Manager.

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Starting LSB: Tune IDE hard disks...

Mar 4 06:04:28 yyyyyyyyyy systemd[1]: Mounted FUSE Control File System.

Mar 4 06:04:28 yyyyyyyyyy kernel: [ 0.000000] Initializing cgroup subsys cpuset

Mar 4 06:04:28 yyyyyyyyyy kernel: [ 0.000000] Initializing cgroup subsys cpu

Mar 4 06:04:28 yyyyyyyyyy kernel: [ 0.000000] Initializing cgroup subsys cpuacct

node 3 (192.168.xxxx.z) which is not rebooted

Mar 4 05:57:39 pve-quorum rrdcached[1018]: flushing old values

Mar 4 05:57:39 pve-quorum rrdcached[1018]: rotating journals

Mar 4 05:57:39 pve-quorum rrdcached[1018]: started new journal /var/lib/rrdcached/journal/rrd.journal.1488603459.171214

Mar 4 05:57:39 pve-quorum rrdcached[1018]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1488596259.172788

Mar 4 05:57:58 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 05:58:10 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 05:59:07 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 05:59:53 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 06:04:44 pve-quorum pmxcfs[1070]: [dcdb] notice: data verification successful

Mar 4 06:08:14 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 06:09:10 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 06:09:57 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 06:13:02 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 06:17:01 pve-quorum CRON[7898]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)

Mar 4 06:18:16 pve-quorum pmxcfs[1070]: [status] notice: received log

Mar 4 06:19:13 pve-quorum pmxcfs[1070]: [status] notice: received log

Thanks

manu · Mar 6, 2017

Such a general reboot could be caused by the switch being down for a short time, causing all nodes to selffence themselves.

djahida · Mar 6, 2017

In my case I have two switches, each physical node is connected to one switch.
If it was a network failure on the switch, why the third node, which is a Virtual Box connected to the same switch as one the physical nodes, is not rebooted. I don’t know, but for me it can be a failure network on the switch
And how about the multicast? Or the desynchronization of time?

djahida · Mar 7, 2017

I still have the same problem. Another reboot today
Any idea about it?

manu · Mar 7, 2017

you need to have a look into the content of /var/log/syslog on the minutes before the reboot
there some tips on how to read this syslog here

https://forum.proxmox.com/threads/random-reboot-due-multicast-hasg-table-full.33175/#post-162841

djahida · Mar 8, 2017

Thanks Manu

In the file log I don’t find any log about corosync[]: [TOTEM ] Retransmit List:

And nothing about Lost quorum

I have juste

Mar 4 05:59:10 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.y.xxx:97832) was formed. Members left: 3

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [TOTEM ] Failed to receive the leave message. failed: 3

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [QUORUM] Members[2]: 1 2

Mar 4 05:59:12 xxxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000020)

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001C)

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received all states

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: leader is 1/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: synced members: 1/3318, 2/3318

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: start sending inode updates

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: sent all (0) updates

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: all data is up to date

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: dfsm_deliver_queue: queue length 2

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: received all states

Mar 4 05:59:12 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: all data is up to date

Mar 4 05:59:13 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.y.xxx:97836) was formed. Members joined: 3

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: starting data syncronisation

Mar 4 05:59:13 xxxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:59:13 xxxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: cpg_send_message retried 1 times

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: members: 1/3318, 2/3318, 3/1070

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: starting data syncronisation

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [dcdb] notice: received sync request (epoch 1/3318/00000021)

Mar 4 05:59:13 xxxxxxxxxxxx pmxcfs[3318]: [status] notice: received sync request (epoch 1/3318/0000001D)

Mar 4 05:59:16 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A processor failed, forming new configuration.

Mar 4 05:59:17 xxxxxxxxxxxx corosync[3421]: [TOTEM ] A new membership (192.168.y.xxx:97840) was formed. Members

Mar 4 05:59:17 xxxxxxxxxxxx corosync[3421]: [QUORUM] Members[3]: 1 2 3

Mar 4 05:59:17 xxxxxxxxxxxx corosync[3421]: [MAIN ] Completed service synchronization, ready to provide service.

Mar 4 06:00:01 xxxxxxxxxxxx CRON[180817]: (root) CMD (/usr/local/bin/mount.sh /var/log/mount.log)

Mar 4 06:00:09 xxxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates

Mar 4 06:00:09 xxxxxxxxxxxx watchdog-mux[3012]: client watchdog expired - disable watchdog updates

Mar 4 06:04:50 xxxxxxxxxxxx rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="3140" x-info="http://www.rsyslog.com"] start

Mar 4 06:04:50 xxxxxxxxxxxx systemd-modules-load[1317]: Module 'fuse' is builtin

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Mounted POSIX Message Queue File System.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Mounted Debug File System.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Mounted Huge Pages File System.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Started udev Coldplug all Devices.

Mar 4 06:04:50 xxxxxxxxxxxx systemd[1]: Starting udev Wait for Complete Device Initialization...

Mar 4 06:04:50 xxxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_devintf'

Mar 4 06:04:50 xxxxxxxxxxxx systemd-modules-load[1317]: Inserted module 'ipmi_poweroff'

djahida · Mar 8, 2017

By default, how long the cluster takes to declare a node as failing?

In which configuration file can I find the option that allows to increase this time?

manu · Mar 9, 2017

If a node cannot communicate with other cluster nodes AND has ha ressources it will reboot itself after 60 seconds.
This paramater is not configurable.

DId you do the omping test for 10 minutes, as explained in the pve wiki ?
Looking in at the number of messages [TOTEM ] A processor failed, forming new configuration, Corosync has communication problesm, with most of the times point to multicast problems.

djahida · Mar 9, 2017

I do this omping -c 600 -i 1 -q 192.168.xx.x 192.168.xx.y 192.168.xx.z in each node and I got this result

First node 192.168.xx.x

192.168.xx.y : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.091/0.151/0.257/0.024

192.168.xx.y : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.103/0.158/0.260/0.025

192.168.xx.z : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.247/2.012/65.861/4.663

192.168.xx.z : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.464/2.591/90.534/5.699

Second node 192.168.xx.y

192.168.xx.x : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.087/0.150/0.246/0.025

192.168.xx.x : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.093/0.160/0.257/0.027

192.168.xx.z : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.320/2.397/83.187/5.902

192.168.xx.z : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.547/2.932/83.573/6.476

Third node 192.168.xx.z

192.168.xx.x : unicast, xmt/rcv/%loss = 599/599/0%, min/avg/max/std-dev = 0.161/0.809/23.102/1.372

192.168.xx.x : multicast, xmt/rcv/%loss = 599/599/0%, min/avg/max/std-dev = 0.168/0.831/23.119/1.376

192.168.xx.y : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.132/0.532/22.955/1.237

192.168.xx.y : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.134/0.544/22.958/1.242

djahida · Mar 9, 2017

juste to be sure
I need to activate IGMP snooping and IGMP querier, that's right?

djahida · Mar 14, 2017

is the result of omping correctt in my configuration?
I still have the error corosync[3487]: [TOTEM ] A processor failed, forming new configuration. in the log file

djahida · Mar 31, 2017

Hi

I really need some help, please

I think it's a multicast problem, but when I check it with omping I have 0% of lost data !!!

So, I have a question about multicast configuration on network devices

I made echo 1 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier

then I have the logs bellow and no node can see the other one

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238 31239 3123a 3123b 3123c 3123d 3123e 3123f

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238 31239 3123a 3123b 3123c 3123d 3123e 3123f

Mar 31 11:26:15 xxxxxxxxxxxx corosync[3472]: [TOTEM ] Retransmit List: 31233 31234 31235 31236 31237 31238 31239 3123a 3123b 3123c 3123d 3123e 3123f

So I made echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier

Can someone explain the reason to have lost quorum when I enable multicast on network devices

Search

Search

All node's cluster are fenced at the same time

djahida

Active Member

manu

Proxmox Staff Member

djahida

Active Member

djahida

Active Member

manu

Proxmox Staff Member

djahida

Active Member

djahida

Active Member

manu

Proxmox Staff Member

djahida

Active Member

djahida

Active Member

djahida

Active Member

djahida

Active Member