Cluster, Corosync Problem: JOIN or LEAVE message was thrown away during flush...

pxo

Renowned Member
Nov 3, 2013
31
0
71
Cluster, Corosync Problem: JOIN or LEAVE message was thrown away during flush operation

Hello,

I have caused the problem yourself: pvecm delnode px2 and pvecm delnode px3
Nodes px1 and px2 are back and online.

Code:
root@px1 ~ > pvecm status
Quorum information
------------------
Date:             Tue Dec  1 12:23:13 2015
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1888
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.0.7 (local)
0x00000002          1 192.168.0.8

The corosync config on node px1 an px2 :
Code:
root@px1 /etc/pve > cat corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: px3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: px3
  }

  node {
    name: px2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: px2
  }

  node {
    name: px1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: px1
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Domainname
  config_version: 7
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 192.168.0.7
    ringnumber: 0
  }

}


Node3 no come back, the Log:
Code:
root@px3 ~ > grep corosync /var/log/syslog
Dec  1 10:27:45 px3 corosync[32014]: Starting Corosync Cluster Engine (corosync): [FAILED]
Dec  1 10:27:45 px3 systemd[1]: corosync.service: control process exited, code=exited status=1
Dec  1 10:27:45 px3 systemd[1]: Unit corosync.service entered failed state.
Dec  1 10:36:38 px3 pvedaemon[11768]: <root@pam> starting task UPID:px3:000018DF:0033D07C:565D6A26:srvstart:corosync:root@pam:
Dec  1 10:36:38 px3 pvedaemon[6367]: starting service corosync: UPID:px3:000018DF:0033D07C:565D6A26:srvstart:corosync:root@pam:
Dec  1 10:36:38 px3 corosync[6374]:  [MAIN  ] Corosync Cluster Engine ('2.3.5'): started and ready to provide service.
Dec  1 10:36:38 px3 corosync[6374]:  [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Dec  1 10:36:38 px3 corosync[6375]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec  1 10:36:38 px3 corosync[6375]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Dec  1 10:36:38 px3 corosync[6375]:  [TOTEM ] The network interface [192.168.0.9] is now up.
Dec  1 10:36:38 px3 corosync[6375]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec  1 10:36:38 px3 corosync[6375]:  [QB    ] server name: cmap
Dec  1 10:36:38 px3 corosync[6375]:  [SERV  ] Service engine loaded: corosync configuration service [1]
Dec  1 10:36:38 px3 corosync[6375]:  [QB    ] server name: cfg
Dec  1 10:36:38 px3 corosync[6375]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec  1 10:36:38 px3 corosync[6375]:  [QB    ] server name: cpg
Dec  1 10:36:38 px3 corosync[6375]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec  1 10:36:38 px3 corosync[6375]:  [QUORUM] Using quorum provider corosync_votequorum
Dec  1 10:36:38 px3 corosync[6375]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec  1 10:36:38 px3 corosync[6375]:  [QB    ] server name: votequorum
Dec  1 10:36:38 px3 corosync[6375]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec  1 10:36:38 px3 corosync[6375]:  [QB    ] server name: quorum
Dec  1 10:36:38 px3 corosync[6375]:  [TOTEM ] JOIN or LEAVE message was thrown away during flush operation.
Dec  1 10:36:38 px3 corosync[6375]:  [TOTEM ] JOIN or LEAVE message was thrown away during flush operation.
Dec  1 10:36:38 px3 corosync[6375]:  [TOTEM ] A new membership (192.168.0.9:1880) was formed. Members joined: 3
Dec  1 10:36:38 px3 corosync[6375]:  [QUORUM] Members[1]: 3
Dec  1 10:36:39 px3 corosync[6375]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec  1 10:36:39 px3 corosync[6375]:  [TOTEM ] A new membership (192.168.0.7:1884) was formed. Members joined: 1 2
Dec  1 10:36:39 px3 corosync[6375]:  [CMAP  ] Received config version (6) is different than my config version (5)! Exiting
Dec  1 10:36:39 px3 corosync[6375]:  [SERV  ] Unloading all Corosync service engines.
Dec  1 10:36:39 px3 corosync[6375]:  [QB    ] withdrawing server sockets
Dec  1 10:36:39 px3 corosync[6375]:  [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Dec  1 10:36:39 px3 corosync[6375]:  [QB    ] withdrawing server sockets
Dec  1 10:36:39 px3 corosync[6375]:  [SERV  ] Service engine unloaded: corosync configuration map access
Dec  1 10:36:39 px3 corosync[6375]:  [QB    ] withdrawing server sockets
Dec  1 10:36:39 px3 corosync[6375]:  [SERV  ] Service engine unloaded: corosync configuration service
Dec  1 10:36:39 px3 corosync[6375]:  [QB    ] withdrawing server sockets
Dec  1 10:36:39 px3 corosync[6375]:  [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Dec  1 10:36:39 px3 corosync[6375]:  [QB    ] withdrawing server sockets
Dec  1 10:36:39 px3 corosync[6375]:  [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Dec  1 10:36:39 px3 corosync[6375]:  [SERV  ] Service engine unloaded: corosync profile loading service
Dec  1 10:36:39 px3 corosync[6375]:  [MAIN  ] Corosync Cluster Engine exiting normally
Dec  1 10:37:39 px3 corosync[6369]: Starting Corosync Cluster Engine (corosync): [FAILED]
Dec  1 10:37:39 px3 systemd[1]: corosync.service: control process exited, code=exited status=1
Dec  1 10:37:39 px3 systemd[1]: Unit corosync.service entered failed state.
Dec  1 10:37:39 px3 pvedaemon[6367]: command 'systemctl start corosync' failed: exit code 1

The Path /etc/pve on node px3 is read only.
Can i repair without new install the node px3 ?
 
what is meant by Clear ?
i tested on node3 purge and reinstall:
Code:
apt-get purge proxmox-ve
apt-get autoremove
apt-get purge `dpkg -l | grep ^rc | awk '{print $2}'`
apt-get install proxmox-ve

and then:
Code:
root@px1 /etc/pve > pvecm add 192.168.0.9
authentication key already exists

what must i manual clean ?
 
ok sorry, in other words.
i like reinstall the proxmox node without new install the debian jessie base.
 
thanks dietmar.
i was typing too fast the wrong command [delnode]. and i no read the wiki, so i not shutdown the node before delnode :rolleyes:

i found another way without new install the node px3
Code:
root@px3 / > service pve-cluster stop
root@px3 / > rm /var/lib/pve-cluster/config.db*
root@px3 / > scp root@px1:/var/lib/pve-cluster/config.db /var/lib/pve-cluster/config.db
root@px3 / > chmod 600 /var/lib/pve-cluster/config.db
root@px3 / > reboot

Code:
Dec  1 18:38:33 px3 corosync[1087]:  [MAIN  ] Corosync Cluster Engine ('2.3.5'): started and ready to provide service.
Dec  1 18:38:33 px3 corosync[1087]:  [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Dec  1 18:38:33 px3 corosync[1088]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec  1 18:38:33 px3 corosync[1088]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Dec  1 18:38:33 px3 corosync[1088]:  [TOTEM ] The network interface [192.168.0.9] is now up.
Dec  1 18:38:33 px3 corosync[1088]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec  1 18:38:33 px3 corosync[1088]:  [QB    ] server name: cmap
Dec  1 18:38:33 px3 corosync[1088]:  [SERV  ] Service engine loaded: corosync configuration service [1]
Dec  1 18:38:33 px3 corosync[1088]:  [QB    ] server name: cfg
Dec  1 18:38:33 px3 corosync[1088]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec  1 18:38:33 px3 corosync[1088]:  [QB    ] server name: cpg
Dec  1 18:38:33 px3 corosync[1088]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec  1 18:38:33 px3 corosync[1088]:  [QUORUM] Using quorum provider corosync_votequorum
Dec  1 18:38:33 px3 corosync[1088]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec  1 18:38:33 px3 corosync[1088]:  [QB    ] server name: votequorum
Dec  1 18:38:33 px3 corosync[1088]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec  1 18:38:33 px3 corosync[1088]:  [QB    ] server name: quorum
Dec  1 18:38:33 px3 corosync[1088]:  [TOTEM ] JOIN or LEAVE message was thrown away during flush operation.
Dec  1 18:38:33 px3 corosync[1088]:  [TOTEM ] A new membership (192.168.0.9:1928) was formed. Members joined: 3
Dec  1 18:38:33 px3 corosync[1088]:  [QUORUM] Members[1]: 3
Dec  1 18:38:33 px3 corosync[1088]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec  1 18:38:33 px3 corosync[1088]:  [TOTEM ] A new membership (192.168.0.7:1932) was formed. Members joined: 1 2
Dec  1 18:38:33 px3 corosync[1088]:  [QUORUM] This node is within the primary component and will provide service.
Dec  1 18:38:33 px3 corosync[1088]:  [QUORUM] Members[3]: 1 2 3
Dec  1 18:38:33 px3 corosync[1088]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec  1 18:38:34 px3 corosync[1081]: Starting Corosync Cluster Engine (corosync): [  OK  ]

I'll never do it again :-)