We have a low-budget 3-nodes (5yr old servers, with 32gb ram) cluster, with ceph in two of them.
They are connected with an HP 1920G switch, 2 nic for ceph and 2 for corosync and lan.
After some minutes, the cluster stops working (each node sees only itself as online) and after some time the nodes reboot.
I tried changing switch for the corosync network, without success.
I also enabled IGMP Snooping on both the switches, without success.
Thanks for any help
journalctl -u corosync -u pve-cluster -b:
Jan 09 11:21:21 piedone systemd[1]: Starting The Proxmox VE cluster filesystem...
Jan 09 11:21:21 piedone pmxcfs[2172]: [quorum] crit: quorum_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [quorum] crit: can't initialize service
Jan 09 11:21:21 piedone pmxcfs[2172]: [confdb] crit: cmap_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [confdb] crit: can't initialize service
Jan 09 11:21:21 piedone pmxcfs[2172]: [dcdb] crit: cpg_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [dcdb] crit: can't initialize service
Jan 09 11:21:21 piedone pmxcfs[2172]: [status] crit: cpg_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [status] crit: can't initialize service
Jan 09 11:21:23 piedone systemd[1]: Started The Proxmox VE cluster filesystem.
Jan 09 11:21:23 piedone systemd[1]: Starting Corosync Cluster Engine...
Jan 09 11:21:23 piedone corosync[2215]: [MAIN ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.
Jan 09 11:21:23 piedone corosync[2215]: [MAIN ] Corosync built-in features: augeas systemd pie relro bindnow
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] The network interface [10.73.73.3] is now up.
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync configuration map access [0]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: cmap
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync configuration service [1]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: cfg
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: cpg
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync profile loading service [4]
Jan 09 11:21:23 piedone corosync[2258]: [QUORUM] Using quorum provider corosync_votequorum
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: votequorum
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: quorum
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] A new membership (10.73.73.3:60708) was formed. Members joined: 1
Jan 09 11:21:23 piedone corosync[2258]: [QUORUM] Members[1]: 1
Jan 09 11:21:23 piedone corosync[2258]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 09 11:21:24 piedone corosync[2197]: Starting Corosync Cluster Engine (corosync): [ OK ]
Jan 09 11:21:24 piedone systemd[1]: Started Corosync Cluster Engine.
Jan 09 11:21:27 piedone pmxcfs[2172]: [status] notice: update cluster info (cluster name soasi, version = 5)
Jan 09 11:21:27 piedone pmxcfs[2172]: [dcdb] notice: members: 1/2172
Jan 09 11:21:27 piedone pmxcfs[2172]: [dcdb] notice: all data is up to date
Jan 09 11:21:27 piedone pmxcfs[2172]: [status] notice: members: 1/2172
Jan 09 11:21:27 piedone pmxcfs[2172]: [status] notice: all data is up to date
root@piedone:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: bambino
nodeid: 3
quorum_votes: 1
ring0_addr: bambino
}
node {
name: px1
nodeid: 2
quorum_votes: 1
ring0_addr: px1
}
node {
name: piedone
nodeid: 1
quorum_votes: 1
ring0_addr: piedone
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: soasi
config_version: 5
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.73.73.3
ringnumber: 0
}
}
They are connected with an HP 1920G switch, 2 nic for ceph and 2 for corosync and lan.
After some minutes, the cluster stops working (each node sees only itself as online) and after some time the nodes reboot.
I tried changing switch for the corosync network, without success.
I also enabled IGMP Snooping on both the switches, without success.
Thanks for any help
journalctl -u corosync -u pve-cluster -b:
Jan 09 11:21:21 piedone systemd[1]: Starting The Proxmox VE cluster filesystem...
Jan 09 11:21:21 piedone pmxcfs[2172]: [quorum] crit: quorum_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [quorum] crit: can't initialize service
Jan 09 11:21:21 piedone pmxcfs[2172]: [confdb] crit: cmap_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [confdb] crit: can't initialize service
Jan 09 11:21:21 piedone pmxcfs[2172]: [dcdb] crit: cpg_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [dcdb] crit: can't initialize service
Jan 09 11:21:21 piedone pmxcfs[2172]: [status] crit: cpg_initialize failed: 2
Jan 09 11:21:21 piedone pmxcfs[2172]: [status] crit: can't initialize service
Jan 09 11:21:23 piedone systemd[1]: Started The Proxmox VE cluster filesystem.
Jan 09 11:21:23 piedone systemd[1]: Starting Corosync Cluster Engine...
Jan 09 11:21:23 piedone corosync[2215]: [MAIN ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.
Jan 09 11:21:23 piedone corosync[2215]: [MAIN ] Corosync built-in features: augeas systemd pie relro bindnow
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] The network interface [10.73.73.3] is now up.
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync configuration map access [0]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: cmap
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync configuration service [1]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: cfg
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: cpg
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync profile loading service [4]
Jan 09 11:21:23 piedone corosync[2258]: [QUORUM] Using quorum provider corosync_votequorum
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: votequorum
Jan 09 11:21:23 piedone corosync[2258]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jan 09 11:21:23 piedone corosync[2258]: [QB ] server name: quorum
Jan 09 11:21:23 piedone corosync[2258]: [TOTEM ] A new membership (10.73.73.3:60708) was formed. Members joined: 1
Jan 09 11:21:23 piedone corosync[2258]: [QUORUM] Members[1]: 1
Jan 09 11:21:23 piedone corosync[2258]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 09 11:21:24 piedone corosync[2197]: Starting Corosync Cluster Engine (corosync): [ OK ]
Jan 09 11:21:24 piedone systemd[1]: Started Corosync Cluster Engine.
Jan 09 11:21:27 piedone pmxcfs[2172]: [status] notice: update cluster info (cluster name soasi, version = 5)
Jan 09 11:21:27 piedone pmxcfs[2172]: [dcdb] notice: members: 1/2172
Jan 09 11:21:27 piedone pmxcfs[2172]: [dcdb] notice: all data is up to date
Jan 09 11:21:27 piedone pmxcfs[2172]: [status] notice: members: 1/2172
Jan 09 11:21:27 piedone pmxcfs[2172]: [status] notice: all data is up to date
root@piedone:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: bambino
nodeid: 3
quorum_votes: 1
ring0_addr: bambino
}
node {
name: px1
nodeid: 2
quorum_votes: 1
ring0_addr: px1
}
node {
name: piedone
nodeid: 1
quorum_votes: 1
ring0_addr: piedone
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: soasi
config_version: 5
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.73.73.3
ringnumber: 0
}
}