Hello,
we have problems with our proxmox cluster, "/etc/pve" is read-only although cluster looks ok in "pvecm status" output:
We already stopped/restarted cman/pve-cluster services with different success (sometimes one node didn't saw the others and was in own cluster), but even now with every node in cluster proxmox says "no quorum".
The systems were in different software state at moment of fail, but were upgraded to the same last version of 3.4 branch (without reboot, so with different kernel version)
This is the only error messages that we see on all nodes
Does someone have clue how is possible to fix (even better without reboot)?
P.S. we use unicast (no multicast) and added hostnames to /etc/hosts to exclude DNS issues.
we have problems with our proxmox cluster, "/etc/pve" is read-only although cluster looks ok in "pvecm status" output:
Code:
Version: 6.2.0
Config Version: 26
Cluster Name: BLN
Cluster Id: 494
Cluster Member: Yes
Cluster Generation: 40632
Membership state: Cluster-Member
Nodes: 8
Expected votes: 8
Total votes: 8
Node votes: 1
Quorum: 5
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: proxmox03
Node ID: 6
Multicast addresses: 255.255.255.255
Node addresses: 10.1.1.203
We already stopped/restarted cman/pve-cluster services with different success (sometimes one node didn't saw the others and was in own cluster), but even now with every node in cluster proxmox says "no quorum".
The systems were in different software state at moment of fail, but were upgraded to the same last version of 3.4 branch (without reboot, so with different kernel version)
Code:
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-15 (running version: 3.4-15/e1daa307)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-37-pve: 2.6.32-150
pve-kernel-2.6.32-34-pve: 2.6.32-140
pve-kernel-2.6.32-46-pve: 2.6.32-177
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-20
qemu-server: 3.4-9
pve-firmware: 1.1-5
libpve-common-perl: 3.0-27
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-35
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-25
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
This is the only error messages that we see on all nodes
Code:
Sep 20 06:26:43 proxmox03 pmxcfs[653226]: [dcdb] notice: cpg_join retry 390700
Sep 20 06:26:44 proxmox03 pmxcfs[653226]: [dcdb] notice: cpg_join retry 390710
Sep 20 06:26:44 proxmox03 pmxcfs[653226]: [status] crit: cpg_send_message failed: 9
Sep 20 06:26:44 proxmox03 pmxcfs[653226]: [status] crit: cpg_send_message failed: 9
Does someone have clue how is possible to fix (even better without reboot)?
P.S. we use unicast (no multicast) and added hostnames to /etc/hosts to exclude DNS issues.