HA with different zfs pools

thoralf

New Member
Nov 29, 2023
10
0
1
I have a 4 node Cluster.
They all share a common pool name, so replication works, failover as well.

But, I have one node with a lot more storage than the others to hold my file server.
I cannot replicate this instance, because the other pools are too small to hold all the data.

What I would like to do: Add another storage pool to a different node and replicate the file server instance to this pool.
I know that this is currently only possible when I do this manually. While not ideal, I can live with it.
But so far I lack an idea on how to create a failover functionality. As in:
1. How do I manage to migrate my file server to the backup node in case the main node dies?
2. How can I ensure that the main node does not start the file server in case it comes back online.

Is there a better solution that I didn't think of so far?
 
For automatic replication between different ZPools, you can use pve-zsync [0]. You can find the VM configuration under:

Code:
/var/lib/pve-zsync/<VMID>.conf.rep_<JOB_NAME><VMID>_<TIMESTAMP>

How can I ensure that the main node does not start the file server in case it comes back online.
Disable "autostart" of that VM.


[0] https://pve.proxmox.com/wiki/PVE-zsync
 
IIUC, you have one host with some big pool were you host your fileserver VM. Let's call this pool "filepool" and shows as storage "filepool" in PVE, restricted to the one host that runs the fileserver VM. You would just need to add disk(s) to another host, configure a ZFS with pool name "filepool" without adding it as storage in PVE. Then go to datacenter->storage, edit storage "filepool" and restrict it to the current host and the new one were you added drives. Now you can use storage replication with those two nodes.

With all that in place, enable HA for that fileserver VM using a group that restric the nodes it can be run to those with "filepool".
 
  • Like
Reactions: UdoB
IIUC, you have one host with some big pool were you host your fileserver VM. Let's call this pool "filepool" and shows as storage "filepool" in PVE, restricted to the one host that runs the fileserver VM. You would just need to add disk(s) to another host, configure a ZFS with pool name "filepool" without adding it as storage in PVE. Then go to datacenter->storage, edit storage "filepool" and restrict it to the current host and the new one were you added drives. Now you can use storage replication with those two nodes.

With all that in place, enable HA for that fileserver VM using a group that restric the nodes it can be run to those with "filepool".
Problem is: The other hosts cannot sync their guests to that pool then either, because it's named differently.
So, I have the same problem, just with different guests.
 
For automatic replication between different ZPools, you can use pve-zsync [0]. You can find the VM configuration under:

Code:
/var/lib/pve-zsync/<VMID>.conf.rep_<JOB_NAME><VMID>_<TIMESTAMP>


Disable "autostart" of that VM.


[0] https://pve.proxmox.com/wiki/PVE-zsync
But that would cause the problem that after simply restarting the host, the guest would not come up at all. I would have to manually start the file server.

From the answer it looks like my scenario is just to much of an edge case.
 
Problem is: The other hosts cannot sync their guests to that pool then either, because it's named differently.
So, I have the same problem, just with different guests.
It's impossible to give accurate instructions/recommendations without accurate questions/configurations. Please, post the exact config of the hosts regarding disk/storage so we can help you out instead of pulling a crystal ball to guess your settings :)
 
It's impossible to give accurate instructions/recommendations without accurate questions/configurations. Please, post the exact config of the hosts regarding disk/storage so we can help you out instead of pulling a crystal ball to guess your settings :)
point taken

setup:
pve1: 12 TB storage
pve2: 4 TB
pve3: 4 TB
pve4: 4 TB

All share one pool name. Let's call it zpool1.
All guests from pve2-3 replicate all their guests to all other hosts.
All but the "fileserver" guest on pve1 replicate their guests to all other hosts.

If one host fails, the remaining hosts will run whatever got lost.

That leaves the "fileserver" as the only single point of failure. It has far too much data to replicate to the smaller zpool1 on the other hosts.
The idea was, to add a zpool2 to pve2 (about 10 TB) and use that pool solely as a failover for the "fileserver".
This way I do not have the same redundancy as the other guests, but at least something.

So, one guest ("fileserver") on pve1 on zpool1 should be able to replicate to zpool2 on pve2 and should automatically start on pve2 if pve1 fails.
It should not automatically start on pve1 if the host recovers and the guest has already been migrated to pve2/zpool2.
 
Last edited:
Now we are talking!

It's doable, but will require some steps and has some restrictions.

The idea is to restrict pool zpool1 to hosts pve2,3 and 4 and rename zpool1 in pve1 to filepool. Then add disk(s) to pve2 and create a filepool there. Now you have filepool in two machines (1,2) and zpool1 in three (2,3,4), allowing you to use both storage replication and HA.

Restrictions:
  • pve01 only has one pool, filepool. If you want to run more VMs in this host, they will have to be stored in filepool. Luckly you can use pve2 to do live storage migration.
  • VMs in filepool will run only in hosts pve1 and pve2. That may make cluster resource reservation a little trickier when planning for outages or maintenance.
  • At HA level, groups must be used to restrict where a VM can run to avoid bad migrations and manual recovery of such resources at HA level.
Steps:
  • Disclaimer: hope not, but I might forget some step: please try them on a lab first!!! Untested!!
  • Make backup of all VMs. Make sure restore works.
  • Check that backup again ;)
  • Double check there are no storage replication jobs using pve1. If there are some, remove them and wait for them to run.
  • Empty pve1, leave only the fileserver VM and shutdown the VM.
  • Go to Datacenter, Storage. Edit zpool1 and restrict it to nodes 2,3,4. This will disable all accesses from pvestatd to the pool.
  • On pve2, add disk(s) and create the pool filepool. Add it as storage. Will be restricted just to that host.
  • Unmount zpool1 from pve1:
    Code:
    zpool export zpool1
  • Mount the pool with the new name:
    Code:
    zpool import zpool1 filepool
  • If you check with zpool list -v it will show with the new name: filepool.
  • Now go to Datacenter, Storage. Edit filepool and retrict it to both pve1 and pve2. The storage should show now on the left side of the webUI and show online. If you go to VM disks the disk(s) of the fileserver VM should be there. This works because the pool is called filepool in both hosts.
  • Edit the config for the VM and make it point to the new PVE storage name: scsi0: zpool1:vm-100-disk-0,discard=on,iothread=1,size=20G,ssd=1 would become scsi0: filepool:vm-100-disk-0,discard=on,iothread=1,size=20G,ssd=1.
  • Boot the VM.
  • Now you can setup storage replication for VMs in both pools, restricted to the hosts that have said pools.
  • Go to HA, create two groups, one for pve 1,2, other for 2,3,4. Tick restricted on both and check only said hosts.
  • Add your VMs to HA and smile, job is done.
 
Last edited:
  • Like
Reactions: thoralf
That actually sounds like a proper solution that does the job and is fairly solid.
Thanks a lot, I will give this a try as soon as I have some spare time.

Update: Implemented this yesterday evening and it works fine, just like envisioned. I actually modified it slightly and split the disks that were supposed to be used on the 2nd node to be used as single disks on hosts 2 and 4.
This adds another host that I can replicate to at the costs of there being just a single disk.
On the other hand it would now require 3 hosts or 4 disks to fail before I lose the service - which is plenty of redundancy.
(Additionally I run a local and 2 offsite backups. So true data loss should be nearly impossible.

Thank you, again!
 
Last edited: