Proper ZFS Replication & HA

trancekat

New Member
Nov 13, 2024
5
0
1
Hello, all.

Happy New Year. I hope 2025 is wonderful for you.

I have been wrestling with how to properly deploy proxmox and hope I can get some thoughts here, please.

I have included a simple diagram to help illustrate my environment and what I'd like to accomplish. I am asking for advice on what the best way to accomplish my goals is.

I have a main server (sol) running PVE which serves out ~30 LXC services. It also has a TrueNAS VM that I pass an HBA to. The TrueNAS VM manages a ZFS RAIDZ2 pool with datasets such as abyss/family (all family photos/videos), abyss/dox (all scanned documents), etc. The reason this is set up this was is because I migrated from ESXi 6.7 to PVE about 3 months ago, but seeing the power of PVE, I'd like to drop the TrueNAS VM if possible. This server has 2 Zpools, rpool for PVE + LXCs/VMs, and abyss for all my file storage needs.

I have a secondary server (proxima) that I just built, and populated with old drives to act as a backup in case the primary has an issue. It has no LXCs, nor VMs, just a ZFS RAIDZ1 that I have the primary server (sol)'s TrueNAS VM perform ZFS replication to. This server has 2 Zpools, rpool for PVE + LXCs/VMs, and abyss as a ZFS replication target for backing up the primary server's abyss zpool.

I have clustered the primary and secondary servers, but they are not in HA (yet). I have a 10Gbe between these servers with 2 more 10Gbe ports available on my brocade ICX6450-48P switch. Once I get this figured out, I would move these 2 servers into HA with a Qdevice, or I can re-purpose my daily driver to be a 3rd node in the HA cluster.

What I'd like to have happen:
1 - Retire the TrueNAS VM
2 - Import the ZFS Pool into the primary server's PVE for control
3 - Create snapshots of the data stored in that ZFS Pool (abyss)
4 - Perform ZFS replication to the secondary server's ZFS Pool (also named abyss) for a subset of datasets (I don't have enough storage on the secondary to back up all the datasets on the primary)
5 - When the primary server goes down, fail over several services to the secondary (ie Jellyfin, Immich, DNS, NPM, etc)
6 - Failed over services use the local ZFS Pool of the server they are active on for their data (so Immich doesn't try to access the primary server's dataset while it is down for maintenance for example)
7 - When the primary is back up, fail back over to the primary and perform ZFS replication back to the primary from the secondary

End state would be a primary server that hosts several LXCs as well as a large ZFS pool as a file server (with ~2 weeks of snapshots), with a secondary server that has a ZFS pool that the primary sends ZFS replication to, and runs services in case the primary is down.

HomeEnvironment.drawio (1).png
Thank you for your help!
 
Bump.. Any help, please?
Well...

You have given a lot of information, that's really great! You have a seven steps roadmap. On a very first glance I do not see any showstoppers, but I did not analyze all details - there are too many details as you have a complex setup.

And then there is not a single question, so what should I answer? This forum works best if you ask a specific question for a given situation. Analyzing complex setups and verifying a migration plan is much more than I can do, sorry.

And generic replies ("make an offline backup first!") are probably not what you are looking for, right?

Good luck!
 
What I'd like to have happen:
1 - Retire the TrueNAS VM
2 - Import the ZFS Pool into the primary server's PVE for control
3 - Create snapshots of the data stored in that ZFS Pool (abyss)
4 - Perform ZFS replication to the secondary server's ZFS Pool (also named abyss) for a subset of datasets (I don't have enough storage on the secondary to back up all the datasets on the primary)
5 - When the primary server goes down, fail over several services to the secondary (ie Jellyfin, Immich, DNS, NPM, etc)
6 - Failed over services use the local ZFS Pool of the server they are active on for their data (so Immich doesn't try to access the primary server's dataset while it is down for maintenance for example)
7 - When the primary is back up, fail back over to the primary and perform ZFS replication back to the primary from the secondary

End state would be a primary server that hosts several LXCs as well as a large ZFS pool as a file server (with ~2 weeks of snapshots), with a secondary server that has a ZFS pool that the primary sends ZFS replication to, and runs services in case the primary is down.
So.... what are you asking? How to?

step 1. backup everything.
step 2. install pve on both your servers. restore load on one.
step 3. set up zfs replication as described here: https://pve.proxmox.com/wiki/PVE-zsync
 
Well...

You have given a lot of information, that's really great! You have a seven steps roadmap. On a very first glance I do not see any showstoppers, but I did not analyze all details - there are too many details as you have a complex setup.

And then there is not a single question, so what should I answer? This forum works best if you ask a specific question for a given situation. Analyzing complex setups and verifying a migration plan is much more than I can do, sorry.

And generic replies ("make an offline backup first!") are probably not what you are looking for, right?

Good luck!
My question is how do i get proxmox to do snapshots every day for each of the datasets in my zpool, keep them on a rollong 1 week timeline, please.
 
My question is how do i get proxmox to do snapshots every day for each of the datasets in my zpool, keep them on a rollong 1 week timeline, please.

As far as I know this is not possible (with plain PVE). (I would love to be proven wrong!)

What I actually use is an external tool: https://github.com/Corsinvest/cv4pve-autosnap

Edit: there are several script-based tools to create and manage ZFS snapshots, with or without replication to other nodes. Most of them (except perhaps zsync, I've never used that one) are not integrated in PVE and do handle data sets and ZVOLs without the corresponding VM-configuration. That's just not sufficient.
 
Last edited:
where did you look? there are a ton of tools to do this.
With vanilla PVE?

I am really, really happy that PVE is open and allows me to install tools produced by third partys. But it is not good to be forced to do so. And in a commercial setting this would generate a whole bunch of problems - starting with strange technical dependencies to harmed policy guidelines to (possibly) lost support entitlement from Proxmox, the company.
 
But it is not good to be forced to do so.
Forced?!

And in a commercial setting this would generate a whole bunch of problems - starting with strange technical dependencies to harmed policy guidelines to (possibly) lost support entitlement from Proxmox, the company.
I dont even know how to reply to this. No product does EVERYTHING conceivable. Managing datasets outside of vm use is not in scope for a virtualization product; Why would using tooling meant for a different purpose impact support?

If this was a closed source solution, you could either buy a different product (eg, veeam,) make a product request to the vendor, or use the product API. As an open source product, if you feel strongly enough that this feature should be part of the product, submit the code for inclusion. In either way there is absolutely NOTHING improper about using outside software. At ALL.
 
May it's the wrong term. I mean "this is required to reach a given goal as the native software package (PVE) does not offer this function".

I dont even know how to reply to this. No product does EVERYTHING conceivable.
Did I say that?

Perhaps we are talking at cross-purposes. I am really, really satisfied with PVE! And of course it does not include everything and the kitchen sink.

The Web Gui offers a manual way to create snapshots. There is no way to do that by a scheduled task. That's all. I am able to find and to use alternative tools, and that's great.

If additional third-party software - be it a small tool or a full blown graphical Desktop - still qualifies for getting support (paid, via a support-ticket this is!) I don't know. In a mission-critical business system I would try to manipulate the base PVE installation as less as possible - for more than one reason.

In my Homelab there is no such artificial limitation :)
 
  • Like
Reactions: Johannes S

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!