multiple NFS storage setup for HA cluster & simulate a failed node

kenyoukenme

New Member
Aug 27, 2019
11
0
1
24
hi all!

I have 3 nodes in HA cluster using Proxmox 5.4 .
The nodes use the public IP for cluster communication and the private network for the storage network.
Each node has an NFS storage shared with the other nodes.
The disk in all 3 nodes are in RAID 1 (mirroring) (the reason i cant use ceph :( , thats why i need a clever nfs storage setup for HA)

--------------------------------------

I've simulated a failed node (by stopping the pve-cluster service) and as expected the vms are fenced, automatically migrated to other nodes and the failed node is automatically restarted. Albeit this is kinda weird coz the vm is migrated to the other node but still using the nfs storage of the failed node . So my question here is how can i simulate a failed node?

-------------------------------------

Say vmA and vmB are stored in the nfs storage of node1. vmA is held by node2 and vmB is held by node3. If either node3 or node2 is down, the vms they held will be migrated to the other nodes. No surprise there. But what if node1 is down and the other nodes cant access the nfs storage of the node1 (therefore vmA and vmB wont be migrated). How should i make the vm's inside node1 still available?

I would really appreciate if you guys could give me some hints and tips to make a vm with redundancy or how to replicate and sync a vm to other nfs storage in the same cluster. (pvesr/replication only works with zfs local storage, its meant for proxmox cluster without shared storage).

Thank you in advance!
 
Last edited:

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
2,117
321
103
South Tyrol/Italy
So my question here is how can i simulate a failed node?
Hard stop one, best simulations are the (almost) real one. You can do this by either:
* stop it from IPMI (if available)
* poweroff --force (quite safe, syncs all Filesystems and the immediately powers off without waiting for all services)
* pull the power plug (a bit unsafe, but it's what can happen during a real failure, so not bad to test either)

How should i make the vm's inside node1 still available?
This cannot work with that setup, once the backing single NFS is gone, recovering of a HA service cannot work.

The disk in all 3 nodes are in RAID 1 (mirroring) (the reason i cant use ceph :( , thats why i need a clever nfs storage setup for HA)
Do they need to stay in RAID1, or do you mean a HW raid is used and thus you cannot use Ceph? If that's the case you can see if you can disable the HW RAID, so that the disk are "exposed normally" (are in hba mode) - then Ceph could be used savely.
As Ceph or a similar option (like GlusterFS, ...) or an external NFS (which has it's own redundancy) are your only way to make this cope with failures, else you'll always have a single point of failure.. And Ceph would be your best bet, it's has good support and integration from Proxmox VE side.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!