ZFS snapshots and replication: Sufficient as backup?

Dec 15, 2016
86
5
13
56
Berlin
Hi all,

We are currently running a PVE cluster with multiple machines and serve about 15 LXC containers and 10 VMs.

We are using PVE replication (every 15min) to onsite PVE machines and znapzend to create hourly snapshots that are also replicated off site.

This should protect us against a variety of worst case scenarios apart from one: A severe ZFS bug that would render both the production systems and snapshots (ie. backups) useless.

I now wonder if we should run PVE backups of VMs and containers on top of the backups. I don't like the idea because, with nearly 3TB of data, this would need a lot of additional storage plus create a lot of IO when all backups run, often causing PVE sync time-outs.

What would be the general opinion and risk assessment regarding reliability of ZFS, zfs send and snapshots?
 

wolfgang

Proxmox Staff Member
Staff member
Oct 1, 2014
6,195
422
103
Hi,
What would be the general opinion and risk assessment regarding reliability of ZFS, zfs send and snapshots?
I guess this is a personal impression.
The likelihood of losing data if you use multiple different technologies decrease.
What I mean is if you have an additional backup on different storage technology you are better protected against storage bugs.
I personally use the old backup rule 3-2-1 [1]
https://www.backblaze.com/blog/the-3-2-1-backup-strategy/
 

LnxBil

Famous Member
Feb 21, 2015
5,448
589
133
Germany
We are using PVE replication (every 15min) to onsite PVE machines and znapzend to create hourly snapshots that are also replicated off site.
It is also important that you have you VM config files on your off-site backup, just the ZFS replication is not enough.
The data is often enough, but in a total failure case in which your time-to-recovery is crucial, having these config files is very important and saves you a lot of time.

I have a couple of single-node ZFS-based machines that are backed-up just like you do. Everything is on ZFS there, so in a recovery case I just need to boot a live-medium, create and send/receive the pool and write the boot sector and the machine is good to go. Additionally, I have a separate dataset on which the PVE config files are synced:

Code:
rsync --inplace --no-whole-file -ax --delete-before   /etc/pve/  /rpool/pveconfigs/
for easier VM extraction. The data is still in the sqlite database that PVE uses internally, but this is easier to access.
 
  • Like
Reactions: pro lamer

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!