1 server dual node failed

dthompson

Well-Known Member
Nov 23, 2011
146
14
58
Canada
www.digitaltransitions.ca
I have a setup as follows:

2 physical servers with 2 nodes per server.
1 of the physical servers looks like its failed.
I've got the guests backup and running on the other physical server with 2 nodes and everything appears to be fine.

pvecm status

Cluster information
-------------------
Name: vdc-cluster
Config Version: 8
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Aug 8 14:26:52 2022
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 4.894
Quorate: Yes

Vote quorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000004 1 172.16.1.5 (local)
0x00000005 1 172.16.1.6


My question is now, what are best practices to remove the 2 dead server from the cluster and what is the best method to bring them back online, provided I can get the existing server working or purchase another replacement server.

HA should now be dead with only a 2 server cluster for the time being, unless I bring up a virtual server on another system to act a dummy server to help manage that. but I'm not sure thats in my best interests either.


Thank you for your help.
 
2 physical servers with 2 nodes per server.
What physical server is this with 2 nodes?

My question is now, what are best practices to remove the 2 dead server from the cluster and what is the best method to bring them back online, provided I can get the existing server working or purchase another replacement server.
Are the disks also defect? Normally you just change the failed hardware, boot up and it automatically integrates back into the cluster.

I had a lot of failed hardware in over 20 years, but the local disks / raid only failed once in an old SCSI RAID5 with coldplug, and it could be restored with a hard crash of the disk to the wall so that we could copy all data from the RAID5. Ego, disks are (if they're used in RAID/mirroring of course) seldomly completly broken, so just try to find out what's the actual problem with your machine and buy (used) spare parts.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!