Hello. My cluster run fine about a month, then all nodes in web interface go red. VMs run as usually.
I have similar problem in past, but after update is gone and now again.
I try to reboot nodes one by one. Two nodes reboot good(nothing changes), third cant start one of VMs(with iSCSI mounted disk). Now i have bad cluster and cant start one of VMs. I realy dont want reboot other nodes.
Can anyone help me restore my cluster?
Date: Wed Aug 10 14:25:39 2016
Quorum provider: corosync_votequorum
Nodes: 6
Node ID: 0x00000004
Ring ID: 8392
Quorate: Yes
Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 6
Quorum: 4
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000005 1 10.5.1.101
0x00000004 1 10.5.1.102 (local)
0x00000001 1 10.5.1.103
0x00000006 1 10.5.1.104
0x00000002 1 10.5.1.105
0x00000003 1 10.5.1.106
root@nsk-prox4:~# pvesm status
ADC05Backup nfs 1 2930134016 1381828128 1548305888 47.66%
NasLvmShared lvm 1 2097147904 602931200 1494216704 29.25%
SharedNas iscsi 1 0 0 0 100.00%
local dir 1 934866048 1199360 933666688 0.63%
zfslocal zfspool 1 933666880 96 933666784 0.50%
proxmox-ve: 4.2-60 (running kernel: 4.4.15-1-pve)
pve-manager: 4.2-17 (running version: 4.2-17/e1400248)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.13-2-pve: 4.4.13-58
pve-kernel-4.4.15-1-pve: 4.4.15-60
lvm2: 2.02.116-pve2
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-43
qemu-server: 4.0-85
pve-firmware: 1.1-8
libpve-common-perl: 4.0-71
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-56
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6-1
pve-container: 1.0-72
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.3-4
lxcfs: 2.0.2-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5.7-pve10~bpo80
I have similar problem in past, but after update is gone and now again.
I try to reboot nodes one by one. Two nodes reboot good(nothing changes), third cant start one of VMs(with iSCSI mounted disk). Now i have bad cluster and cant start one of VMs. I realy dont want reboot other nodes.
Can anyone help me restore my cluster?
Date: Wed Aug 10 14:25:39 2016
Quorum provider: corosync_votequorum
Nodes: 6
Node ID: 0x00000004
Ring ID: 8392
Quorate: Yes
Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 6
Quorum: 4
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000005 1 10.5.1.101
0x00000004 1 10.5.1.102 (local)
0x00000001 1 10.5.1.103
0x00000006 1 10.5.1.104
0x00000002 1 10.5.1.105
0x00000003 1 10.5.1.106
root@nsk-prox4:~# pvesm status
ADC05Backup nfs 1 2930134016 1381828128 1548305888 47.66%
NasLvmShared lvm 1 2097147904 602931200 1494216704 29.25%
SharedNas iscsi 1 0 0 0 100.00%
local dir 1 934866048 1199360 933666688 0.63%
zfslocal zfspool 1 933666880 96 933666784 0.50%
proxmox-ve: 4.2-60 (running kernel: 4.4.15-1-pve)
pve-manager: 4.2-17 (running version: 4.2-17/e1400248)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.13-2-pve: 4.4.13-58
pve-kernel-4.4.15-1-pve: 4.4.15-60
lvm2: 2.02.116-pve2
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-43
qemu-server: 4.0-85
pve-firmware: 1.1-8
libpve-common-perl: 4.0-71
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-56
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6-1
pve-container: 1.0-72
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.3-4
lxcfs: 2.0.2-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5.7-pve10~bpo80