Long story short, we had to reboot our proxmox cluster (6 machines) after a switch change and everything went south.
We had HA enabled and at some point, each server decided to start every VM...which corrupted their disks on the shared storage.
But this was our fault, here is what we did on the nodes to make them start again the VM:
pvecm expected 1
pmxcfs -l
So now we have each node running the VM but they don't see each other and we need help to make it work again. Thanks for your time.
Each node show itself only on pvecm :
We had HA enabled and at some point, each server decided to start every VM...which corrupted their disks on the shared storage.
But this was our fault, here is what we did on the nodes to make them start again the VM:
pvecm expected 1
pmxcfs -l
So now we have each node running the VM but they don't see each other and we need help to make it work again. Thanks for your time.
Each node show itself only on pvecm :
# pvecm status
Quorum information
------------------
Date: Thu Sep 13 19:56:48 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000003
Ring ID: 3/56416
Quorate: Yes
Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.50.188.146 (local)
# systemctl status pve-cluster
Sep 13 19:41:37 blade6 systemd[1]: Starting The Proxmox VE cluster filesystem...
Sep 13 19:41:37 blade6 pmxcfs[64119]: [status] notice: update cluster info (cluster name qls2, version = 6)
Sep 13 19:41:37 blade6 pmxcfs[64119]: [dcdb] notice: members: 3/64119
Sep 13 19:41:37 blade6 pmxcfs[64119]: [dcdb] notice: all data is up to date
Sep 13 19:41:37 blade6 pmxcfs[64119]: [status] notice: members: 3/64119
Sep 13 19:41:37 blade6 pmxcfs[64119]: [status] notice: all data is up to date
Sep 13 19:41:38 blade6 systemd[1]: Started The Proxmox VE cluster filesystem.
Sep 13 19:50:44 blade6 pmxcfs[64119]: [status] notice: node has quorum
#systemctl status pvestatd.service
Sep 13 19:40:54 blade6 pvestatd[3404]: ipcc_send_rec[1] failed: Connection refused
Sep 13 19:40:54 blade6 pvestatd[3404]: ipcc_send_rec[2] failed: Connection refused
Sep 13 19:40:54 blade6 pvestatd[3404]: ipcc_send_rec[3] failed: Connection refused
Sep 13 19:40:54 blade6 pvestatd[3404]: ipcc_send_rec[4] failed: Connection refused
Sep 13 19:40:54 blade6 pvestatd[3404]: status update error: Connection refused
Sep 13 19:41:04 blade6 pvestatd[3404]: ipcc_send_rec[1] failed: Connection refused
Sep 13 19:41:04 blade6 pvestatd[3404]: ipcc_send_rec[2] failed: Connection refused
Sep 13 19:41:04 blade6 pvestatd[3404]: ipcc_send_rec[3] failed: Connection refused
Sep 13 19:41:04 blade6 pvestatd[3404]: ipcc_send_rec[4] failed: Connection refused
Sep 13 19:41:04 blade6 pvestatd[3404]: status update error: Connection refused
Code:
root@blade6:~# pveversion --verbose
proxmox-ve: 5.2-2 (running kernel: 4.15.18-1-pve)
pve-manager: 5.2-6 (running version: 5.2-6/bcd5f008)
pve-kernel-4.15: 5.2-4
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-1-pve: 4.15.18-17
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.10.17-2-pve: 4.10.17-20
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-37
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-29
pve-container: 2.0-24
pve-docs: 5.2-5
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-30
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9