multiple NFS storage setup for HA cluster & simulate a failed node

kenyoukenme · Sep 24, 2019

hi all!

I have 3 nodes in HA cluster using Proxmox 5.4 .
The nodes use the public IP for cluster communication and the private network for the storage network.
Each node has an NFS storage shared with the other nodes.
The disk in all 3 nodes are in RAID 1 (mirroring) (the reason i cant use ceph

, thats why i need a clever nfs storage setup for HA)

--------------------------------------

I've simulated a failed node (by stopping the pve-cluster service) and as expected the vms are fenced, automatically migrated to other nodes and the failed node is automatically restarted. Albeit this is kinda weird coz the vm is migrated to the other node but still using the nfs storage of the failed node . So my question here is how can i simulate a failed node?

-------------------------------------

Say vmA and vmB are stored in the nfs storage of node1. vmA is held by node2 and vmB is held by node3. If either node3 or node2 is down, the vms they held will be migrated to the other nodes. No surprise there. But what if node1 is down and the other nodes cant access the nfs storage of the node1 (therefore vmA and vmB wont be migrated). How should i make the vm's inside node1 still available?

I would really appreciate if you guys could give me some hints and tips to make a vm with redundancy or how to replicate and sync a vm to other nfs storage in the same cluster. (pvesr/replication only works with zfs local storage, its meant for proxmox cluster without shared storage).

Thank you in advance!

t.lamprecht · Sep 24, 2019

kenyoukenme said:
So my question here is how can i simulate a failed node?

Hard stop one, best simulations are the (almost) real one. You can do this by either:
* stop it from IPMI (if available)
* poweroff --force (quite safe, syncs all Filesystems and the immediately powers off without waiting for all services)
* pull the power plug (a bit unsafe, but it's what can happen during a real failure, so not bad to test either)

kenyoukenme said:
How should i make the vm's inside node1 still available?

This cannot work with that setup, once the backing single NFS is gone, recovering of a HA service cannot work.

kenyoukenme said:
The disk in all 3 nodes are in RAID 1 (mirroring) (the reason i cant use ceph , thats why i need a clever nfs storage setup for HA)

Do they need to stay in RAID1, or do you mean a HW raid is used and thus you cannot use Ceph? If that's the case you can see if you can disable the HW RAID, so that the disk are "exposed normally" (are in hba mode) - then Ceph could be used savely.
As Ceph or a similar option (like GlusterFS, ...) or an external NFS (which has it's own redundancy) are your only way to make this cope with failures, else you'll always have a single point of failure.. And Ceph would be your best bet, it's has good support and integration from Proxmox VE side.

Search

Search

multiple NFS storage setup for HA cluster & simulate a failed node

kenyoukenme

New Member

t.lamprecht

Proxmox Staff Member

We value your privacy