How to recover from HW failure on a 3 nodes cluster - VMs are hidden

francoisd

Renowned Member
Sep 10, 2009
55
6
73
Hi,

How should we recover from a lost node due to HW failure on a 3 nodes cluster ?
I can't see the VMs anymore, and have no direct way to restart them on remaining nodes.

1743104573788.png

Best guess is that I should remove the failing node from the cluster, but unfortunately cluster view do not really help:
1743104721824.png

Since the cluster do not seem to handle the situation, you should add a section "Recovery" in the default manual : https://pve/pve-docs/index.html

Surprisingly, the pvecm nodes do not display the node 1 while the web UI does
Code:
root@pve2:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 pve2 (local)
         3          1 pve3
We can however delete the hidden pve1
Code:
root@pve2:~# pvecm delnode pve1
Could not kill node (error = CS_ERR_NOT_EXIST)
Killing node 1

Despite the removal of the pve1 faulty node :
  • The VMs are still hidden, and we don't know how to start them on the remaining nodes
  • The pve1 is still displayed in the WebUI Datacenter but not in the Cluster panel
1743108217335.png


A useful link to remove a node: https://forum.proxmox.com/threads/remove-node-from-cluster.98752/
 
Last edited: