Hi to all,
Every weekend we trigger a full backup of all VMs located on our 5 hosts proxmox cluster.
Our proxmox version is PVE 5.0-30. All nodes run exactly the same version.
We don't really know the reason why, but since last backup (sun december the 3th), 2 of the 5 nodes are displayed as unavailable in the web console.
Corosync is running on all hosts, we've tried to stop services pve-cluster && corosync on all other and start them one by one with no success.
/etc/pve is shared and mounted on all hosts. Ping is ok, ssh between hosts is ok, all hosts are on the same IPV4 subnet.
"pvecm nodes" returns :
"pvecm status" returns :
Because we work on a local network with full gigabyte switchs, we did not find necessary to activate multicast on our switches for 5 hosts.
We've tried to gather information from logs, be we cannot see any suspicious problem except (maybe) a latency caused by simultaneous vzdump on an external NAS support.
Do you think we can recover the cluster sync without rebooting the whole proxmox host ? Each host is running 15 VM and migration is unavailable.
Thanks to all for your precious help.
Every weekend we trigger a full backup of all VMs located on our 5 hosts proxmox cluster.
Our proxmox version is PVE 5.0-30. All nodes run exactly the same version.
We don't really know the reason why, but since last backup (sun december the 3th), 2 of the 5 nodes are displayed as unavailable in the web console.
Corosync is running on all hosts, we've tried to stop services pve-cluster && corosync on all other and start them one by one with no success.
/etc/pve is shared and mounted on all hosts. Ping is ok, ssh between hosts is ok, all hosts are on the same IPV4 subnet.
"pvecm nodes" returns :
Code:
Membership information
----------------------
Nodeid Votes Name
5 1 srvvirt01
3 1 srvvirt02
2 1 srvvirt03
4 1 srvvirt04
1 1 srvvirt05 (local)
"pvecm status" returns :
Code:
pvecm status
Quorum information
------------------
Date: Mon Dec 4 15:45:22 2017
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000005
Ring ID: 5/572
Quorate: Yes
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 5
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000005 1 10.100.1.1 (local)
0x00000003 1 10.100.1.2
0x00000002 1 10.100.1.3
0x00000004 1 10.100.1.4
0x00000001 1 10.100.1.5
Because we work on a local network with full gigabyte switchs, we did not find necessary to activate multicast on our switches for 5 hosts.
We've tried to gather information from logs, be we cannot see any suspicious problem except (maybe) a latency caused by simultaneous vzdump on an external NAS support.
Do you think we can recover the cluster sync without rebooting the whole proxmox host ? Each host is running 15 VM and migration is unavailable.
Thanks to all for your precious help.