Proper ZFS Replication & HA

trancekat · Thursday at 19:51

Hello, all.

Happy New Year. I hope 2025 is wonderful for you.

I have been wrestling with how to properly deploy proxmox and hope I can get some thoughts here, please.

I have included a simple diagram to help illustrate my environment and what I'd like to accomplish. I am asking for advice on what the best way to accomplish my goals is.

I have a main server (sol) running PVE which serves out ~30 LXC services. It also has a TrueNAS VM that I pass an HBA to. The TrueNAS VM manages a ZFS RAIDZ2 pool with datasets such as abyss/family (all family photos/videos), abyss/dox (all scanned documents), etc. The reason this is set up this was is because I migrated from ESXi 6.7 to PVE about 3 months ago, but seeing the power of PVE, I'd like to drop the TrueNAS VM if possible. This server has 2 Zpools, rpool for PVE + LXCs/VMs, and abyss for all my file storage needs.

I have a secondary server (proxima) that I just built, and populated with old drives to act as a backup in case the primary has an issue. It has no LXCs, nor VMs, just a ZFS RAIDZ1 that I have the primary server (sol)'s TrueNAS VM perform ZFS replication to. This server has 2 Zpools, rpool for PVE + LXCs/VMs, and abyss as a ZFS replication target for backing up the primary server's abyss zpool.

I have clustered the primary and secondary servers, but they are not in HA (yet). I have a 10Gbe between these servers with 2 more 10Gbe ports available on my brocade ICX6450-48P switch. Once I get this figured out, I would move these 2 servers into HA with a Qdevice, or I can re-purpose my daily driver to be a 3rd node in the HA cluster.

What I'd like to have happen:
1 - Retire the TrueNAS VM
2 - Import the ZFS Pool into the primary server's PVE for control
3 - Create snapshots of the data stored in that ZFS Pool (abyss)
4 - Perform ZFS replication to the secondary server's ZFS Pool (also named abyss) for a subset of datasets (I don't have enough storage on the secondary to back up all the datasets on the primary)
5 - When the primary server goes down, fail over several services to the secondary (ie Jellyfin, Immich, DNS, NPM, etc)
6 - Failed over services use the local ZFS Pool of the server they are active on for their data (so Immich doesn't try to access the primary server's dataset while it is down for maintenance for example)
7 - When the primary is back up, fail back over to the primary and perform ZFS replication back to the primary from the secondary

End state would be a primary server that hosts several LXCs as well as a large ZFS pool as a file server (with ~2 weeks of snapshots), with a secondary server that has a ZFS pool that the primary sends ZFS replication to, and runs services in case the primary is down.

Thank you for your help!

trancekat · Saturday at 14:15

Bump.. Any help, please?

UdoB · Saturday at 14:40

trancekat said:
Bump.. Any help, please?

Well...

You have given a lot of information, that's really great! You have a seven steps roadmap. On a very first glance I do not see any showstoppers, but I did not analyze all details - there are too many details as you have a complex setup.

And then there is not a single question, so what should I answer? This forum works best if you ask a specific question for a given situation. Analyzing complex setups and verifying a migration plan is much more than I can do, sorry.

And generic replies ("make an offline backup first!") are probably not what you are looking for, right?

Good luck!

alexskysilk · Saturday at 18:25

trancekat said:
What I'd like to have happen:
1 - Retire the TrueNAS VM
2 - Import the ZFS Pool into the primary server's PVE for control
3 - Create snapshots of the data stored in that ZFS Pool (abyss)
4 - Perform ZFS replication to the secondary server's ZFS Pool (also named abyss) for a subset of datasets (I don't have enough storage on the secondary to back up all the datasets on the primary)
5 - When the primary server goes down, fail over several services to the secondary (ie Jellyfin, Immich, DNS, NPM, etc)
6 - Failed over services use the local ZFS Pool of the server they are active on for their data (so Immich doesn't try to access the primary server's dataset while it is down for maintenance for example)
7 - When the primary is back up, fail back over to the primary and perform ZFS replication back to the primary from the secondary

End state would be a primary server that hosts several LXCs as well as a large ZFS pool as a file server (with ~2 weeks of snapshots), with a secondary server that has a ZFS pool that the primary sends ZFS replication to, and runs services in case the primary is down.

So.... what are you asking? How to?

step 1. backup everything.
step 2. install pve on both your servers. restore load on one.
step 3. set up zfs replication as described here: https://pve.proxmox.com/wiki/PVE-zsync

trancekat · Sunday at 20:54

UdoB said:
Well...

You have given a lot of information, that's really great! You have a seven steps roadmap. On a very first glance I do not see any showstoppers, but I did not analyze all details - there are too many details as you have a complex setup.

And then there is not a single question, so what should I answer? This forum works best if you ask a specific question for a given situation. Analyzing complex setups and verifying a migration plan is much more than I can do, sorry.

And generic replies ("make an offline backup first!") are probably not what you are looking for, right?

Good luck!

My question is how do i get proxmox to do snapshots every day for each of the datasets in my zpool, keep them on a rollong 1 week timeline, please.

trancekat · Sunday at 20:58

alexskysilk said:
So.... what are you asking? How to?

step 1. backup everything.
step 2. install pve on both your servers. restore load on one.
step 3. set up zfs replication as described here: https://pve.proxmox.com/wiki/PVE-zsync

Thank you, but I did not see how to set up daily snapshots of my datasets (not VMs or LXCs) for 1 week retention in this link.

UdoB · Sunday at 20:59

trancekat said:
My question is how do i get proxmox to do snapshots every day for each of the datasets in my zpool, keep them on a rollong 1 week timeline, please.

As far as I know this is not possible (with plain PVE). (I would love to be proven wrong!)

What I actually use is an external tool: https://github.com/Corsinvest/cv4pve-autosnap

Edit: there are several script-based tools to create and manage ZFS snapshots, with or without replication to other nodes. Most of them (except perhaps zsync, I've never used that one) are not integrated in PVE and do handle data sets and ZVOLs without the corresponding VM-configuration. That's just not sufficient.

alexskysilk · Sunday at 22:10

trancekat said:
but I did not see how to set up daily snapshots of my datasets

where did you look? there are a ton of tools to do this.

UdoB · 2025-01-06T09:07:37+0100

alexskysilk said:
where did you look? there are a ton of tools to do this.

With vanilla PVE?

I am really, really happy that PVE is open and allows me to install tools produced by third partys. But it is not good to be forced to do so. And in a commercial setting this would generate a whole bunch of problems - starting with strange technical dependencies to harmed policy guidelines to (possibly) lost support entitlement from Proxmox, the company.

alexskysilk · 2025-01-06T17:43:08+0100

UdoB said:
But it is not good to be forced to do so.

Forced?!

UdoB said:
And in a commercial setting this would generate a whole bunch of problems - starting with strange technical dependencies to harmed policy guidelines to (possibly) lost support entitlement from Proxmox, the company.

I dont even know how to reply to this. No product does EVERYTHING conceivable. Managing datasets outside of vm use is not in scope for a virtualization product; Why would using tooling meant for a different purpose impact support?

If this was a closed source solution, you could either buy a different product (eg, veeam,) make a product request to the vendor, or use the product API. As an open source product, if you feel strongly enough that this feature should be part of the product, submit the code for inclusion. In either way there is absolutely NOTHING improper about using outside software. At ALL.

UdoB · 2025-01-06T19:52:10+0100

alexskysilk said:
Forced?!

May it's the wrong term. I mean "this is required to reach a given goal as the native software package (PVE) does not offer this function".

alexskysilk said:
I dont even know how to reply to this. No product does EVERYTHING conceivable.

Did I say that?

Perhaps we are talking at cross-purposes. I am really, really satisfied with PVE! And of course it does not include everything and the kitchen sink.

The Web Gui offers a manual way to create snapshots. There is no way to do that by a scheduled task. That's all. I am able to find and to use alternative tools, and that's great.

If additional third-party software - be it a small tool or a full blown graphical Desktop - still qualifies for getting support (paid, via a support-ticket this is!) I don't know. In a mission-critical business system I would try to manipulate the base PVE installation as less as possible - for more than one reason.

In my Homelab there is no such artificial limitation

v95klima · 2025-01-07T17:57:23+0100

I use sanoid ( which comes with Syncoid ) for automating rotating snapshots and organized , crontab’d transfers to second server with syncoid commands.

Just do a
Apt install sanoid

To read about it
https://github.com/jimsalterjrs/sanoid

Search

Search

Proper ZFS Replication & HA

trancekat

New Member

trancekat

New Member

UdoB

Distinguished Member

alexskysilk

Distinguished Member

trancekat

New Member

trancekat

New Member

UdoB

Distinguished Member

alexskysilk

Distinguished Member

UdoB

Distinguished Member

alexskysilk

Distinguished Member

UdoB

Distinguished Member

v95klima

Member