Hello,
We have a 5 nodes proxmox cluster. The following is a list of version we are currently running:
pve04 pve-manager/3.3-1/a06c9f73 (running kernel: 3.10.0-4-pve)
pve05 pve-manager/3.3-1/a06c9f73 (running kernel: 3.10.0-4-pve)
pve07 pve-manager/3.3-5/bfebec03 (running kernel: 3.10.0-5-pve)
pve08 pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
pve10 pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
Pve07 is the one we upgraded to the currently newest version. Unfortunatly after about 5 minutes quorum is lost. When running the cluster and excluding the pve07 the cluster work fine.
When only running a cluster with pve04,pve05 and pve07 it also functions correctly. The same goes for pve07,pve08 and pve10 (this one we haven't tested for very long)
The following error messages are in the syslog:
By restarting all pve services sometimes quorum can be reached again but after about 5 minutes the same error occurs.
Pve07 output: pvecm status
Pve07 command output: pvecm nodes
Pve04 command output: pvecm status
Pve04 command output: pvecm nodes
Any suggestions on how we can debug this issue?
Thanks in advance!
We have a 5 nodes proxmox cluster. The following is a list of version we are currently running:
pve04 pve-manager/3.3-1/a06c9f73 (running kernel: 3.10.0-4-pve)
pve05 pve-manager/3.3-1/a06c9f73 (running kernel: 3.10.0-4-pve)
pve07 pve-manager/3.3-5/bfebec03 (running kernel: 3.10.0-5-pve)
pve08 pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
pve10 pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
Pve07 is the one we upgraded to the currently newest version. Unfortunatly after about 5 minutes quorum is lost. When running the cluster and excluding the pve07 the cluster work fine.
When only running a cluster with pve04,pve05 and pve07 it also functions correctly. The same goes for pve07,pve08 and pve10 (this one we haven't tested for very long)
The following error messages are in the syslog:
Code:
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [status] crit: cpg_send_message failed: 9
15:27:32 pmxcfs[8674]: [dcdb] notice: cpg_join retry 7260
15:27:33 pmxcfs[8674]: [dcdb] notice: cpg_join retry 7270
15:27:34 pmxcfs[8674]: [dcdb] notice: cpg_join retry 7280
15:27:35 pmxcfs[8674]: [dcdb] notice: cpg_join retry 7290
By restarting all pve services sometimes quorum can be reached again but after about 5 minutes the same error occurs.
Pve07 output: pvecm status
Code:
Version: 6.2.0
Config Version: 19
Cluster Name: tcn01
Cluster Id: 3233
Cluster Member: Yes
Cluster Generation: 90436
Membership state: Cluster-Member
Nodes: 4
Expected votes: 5
Total votes: 4
Node votes: 1
Quorum: 3
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pve07
Node ID: 2
Multicast addresses: x.x.x.x
Node addresses: x.x.x.x
Pve07 command output: pvecm nodes
Code:
Node Sts Inc Joined Name
2 M 89992 2015-01-27 19:03:54 pve07
3 X 90376 pve08
4 M 90436 2015-01-28 14:22:23 pve05
5 M 90436 2015-01-28 14:22:23 pve04
10 M 90436 2015-01-28 14:22:23 pve10
Pve04 command output: pvecm status
Code:
Version: 6.2.0
Config Version: 19
Cluster Name: tcn01
Cluster Id: 3233
Cluster Member: Yes
Cluster Generation: 90436
Membership state: Cluster-Member
Nodes: 4
Expected votes: 5
Total votes: 3
Node votes: 1
Quorum: 3
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pve04
Node ID: 5
Multicast addresses: x.x.x.x
Node addresses: x.x.x.x
Pve04 command output: pvecm nodes
Code:
Node Sts Inc Joined Name
2 M 90436 2015-01-28 14:22:23 pve07
3 X 0 pve08
4 X 90388 pve05
5 M 90052 2015-01-28 09:27:36 pve04
10 M 90388 2015-01-28 09:27:36 pve10
Any suggestions on how we can debug this issue?
Thanks in advance!