Hi,
I'm having a strange issue on my proxmox4.3 cluster: from time to time all nodes appears red in web GUI. Gui also sometimes is not reachable at all and I have to restart nodes..
I figured out that the logs always shows these lines, even if all is marked green:
Reading the forum have lead me to verify multicast which seems to work nicely (omping does work on all nodes using the multicast IP, no loss..).
All hosts are in /etc/hosts
I do see multicast traffic from tcpdump -n "multicast" | grep IP
:
pvecm status do not show any error:
Any help would be greatly appreciated.
I'm having a strange issue on my proxmox4.3 cluster: from time to time all nodes appears red in web GUI. Gui also sometimes is not reachable at all and I have to restart nodes..
I figured out that the logs always shows these lines, even if all is marked green:
Nov 21 12:29:39 24 corosync[1108]: [TOTEM ] A new membership ( 10.0.0.11:1193828) was formed. Members
Nov 21 12:29:39 24 corosync[1108]: [QUORUM] Members[9]: 1 4 3 2 5 6 7 8 9
Nov 21 12:29:39 24 corosync[1108]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 21 12:29:42 24 corosync[1108]: [TOTEM ] A new membership ( 10.0.0.11:1193832) was formed. Members
Nov 21 12:29:42 24 corosync[1108]: [QUORUM] Members[9]: 1 4 3 2 5 6 7 8 9
Nov 21 12:29:42 24 corosync[1108]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 21 12:29:43 24 corosync[1108]: [TOTEM ] A new membership ( 10.0.0.11:1193836) was formed. Members
Nov 21 12:29:43 24 corosync[1108]: [QUORUM] Members[9]: 1 4 3 2 5 6 7 8 9
Nov 21 12:29:43 24 corosync[1108]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 21 12:29:46 24 corosync[1108]: [TOTEM ] A new membership ( 10.0.0.11:1193840) was formed. Members
Nov 21 12:29:46 24 corosync[1108]: [QUORUM] Members[9]: 1 4 3 2 5 6 7 8 9
Nov 21 12:29:46 24 corosync[1108]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 21 12:29:48 24 corosync[1108]: [TOTEM ] A new membership ( 10.0.0.11:1193844) was formed. Members
Nov 21 12:29:48 24 corosync[1108]: [QUORUM] Members[9]: 1 4 3 2 5 6 7 8 9
Nov 21 12:29:48 24 corosync[1108]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 21 12:29:49 24 corosync[1108]: [TOTEM ] A new membership ( 10.0.0.11:1193848) was formed. Members
Nov 21 12:29:49 24 corosync[1108]: [QUORUM] Members[9]: 1 4 3 2 5 6 7 8 9
Nov 21 12:29:49 24 corosync[1108]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 21 12:29:53 24 corosync[1108]: [TOTEM ] A new membership ( 10.0.0.11:1193852) was formed. Members
Nov 21 12:29:53 24 corosync[1108]: [QUORUM] Members[9]: 1 4 3 2 5 6 7 8 9
Nov 21 12:29:53 24 corosync[1108]: [MAIN ] Completed service synchronization, ready to provide service.
Reading the forum have lead me to verify multicast which seems to work nicely (omping does work on all nodes using the multicast IP, no loss..).
All hosts are in /etc/hosts
I do see multicast traffic from tcpdump -n "multicast" | grep IP
:
12:40:40.281906 IP 10.0.0.19.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.281919 IP 10.0.0.19.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.282617 IP 10.0.0.11.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.282621 IP 10.0.0.11.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.282977 IP 10.0.0.12.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.282982 IP 10.0.0.12.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.283358 IP 10.0.0.13.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.283370 IP 10.0.0.13.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.283816 IP 10.0.0.14.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.283829 IP 10.0.0.14.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.284111 IP 10.0.0.15.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.284124 IP 10.0.0.15.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.284406 IP 10.0.0.16.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.284419 IP 10.0.0.16.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.284799 IP 10.0.0.17.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.284812 IP 10.0.0.17.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.285107 IP 10.0.0.18.5404 > 239.192.109.205.5405: UDP, length 1448
12:40:40.285112 IP 10.0.0.18.5404 > 239.192.109.205.5405: UDP, length 824
12:40:40.287696 IP 10.0.0.18.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.287784 IP 10.0.0.19.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.288338 IP 10.0.0.11.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.288666 IP 10.0.0.12.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.289039 IP 10.0.0.13.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.289419 IP 10.0.0.14.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.289695 IP 10.0.0.15.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.289990 IP 10.0.0.16.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.290350 IP 10.0.0.17.5404 > 239.192.109.205.5405: UDP, length 296
12:40:40.381531 IP 10.0.0.17.5404 > 239.192.109.205.5405: UDP, length 88
12:40:40.383675 IP 10.0.0.17.5404 > 239.192.109.205.5405: UDP, length 1176
pvecm status do not show any error:
root@proxmox9:~# pvecm status
Quorum information
------------------
Date: Mon Nov 21 12:43:16 2016
Quorum provider: corosync_votequorum
Nodes: 9
Node ID: 0x00000009
Ring ID: 1/1195216
Quorate: Yes
Votequorum information
----------------------
Expected votes: 9
Highest expected: 9
Total votes: 9
Quorum: 5
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.0.11
0x00000004 1 10.0.0.12
0x00000003 1 10.0.0.13
0x00000002 1 10.0.0.14
0x00000005 1 10.0.0.15
0x00000006 1 10.0.0.16
0x00000007 1 10.0.0.17
0x00000008 1 10.0.0.18
0x00000009 1 10.0.0.19 (local)
Any help would be greatly appreciated.