[SOLVED] Backup strategy relying on FS

Adnan · Feb 27, 2021

I'm not an expert in filesystems, I know the concepts (RAID, ZFS, mounts, logic/physic, etc…).

I'm currently trying to mount a new environment for our company, during a ransomware infection and a poor backup strategy in place by former IT (user files in smb mounts, backups in the same directory in mount, all servers have each other drives mounted as smb shares, which resulted in EVERYTHING encrypted except what was locked by running programs). So I have a server at SoYouStart (Xeon D-1520 [4c/8t], 32GB RAM, 4x2TB SoftRaid) and their version on PVE 6 64Bits installed (by default in an LVM on 2 disks RAID 1), no ZFS. I can reinstall the OS without LVM and without RAID so I can manually add the 3 others disks in Soft RAID 5 for instance.

For reference, I currently have a Windows server VM (Virtio SCSI, Virtio Network, SCSI Disk) with 4cores (kvm64), 8GB RAM, 100Go qcow2 disk, and unzipping a 6GB 7z archive (un Ultra) tool 4 hours, windows was showing less than 10MBps write speed. The host had a load around 10 and iotop showed the load was used by this VM only. After that, importing a .sql file (30k lines) to MySQL in an LXC container took 3 hours to complete. It seems to be too slow, while downloading a 2GB iso from the host is pretty quick (10 sec, with a 250Mbps connection).

First of all, is ZFS absolute must have, and is it "on top" of LVM or better without? I have a 5To storage (mounted on PVE via NFS, can also be mounted via SMB) with the server, I will store regular backups of the VMs to it (once a day at least).

Also is it safe to mount a qcow2 using nbd, and rsync a directory from it while the VM is started?

I'm planning on having a "NAS" VM, which will have smb shares mounted in the Windows VM. The NAS can mount the server storage by NFS/FTP/SMB itself and export regularly (rsnapshot or rsync archive) its files to it.

After my proof of concept, I'll have a budget for 2 or 3 machines, one of them could be a proxmox backup server. But for a month or 2, I'll be stuck with this single machine. Do you guys have ideas, suggestions, must have or must not do on this setup?

Dunuin · Feb 27, 2021

Adnan Rihan said:
First of all, is ZFS absolute must have, and is it "on top" of LVM or better without? I have a 5To storage (mounted on PVE via NFS, can also be mounted via SMB) with the server, I will store regular backups of the VMs to it (once a day at least).

ZFS isn't a must have but it got really nice features you would miss otherwise:
- bit rot protection
- compression on block level
- deduplication
- ...

If you use ZFS you can't use LVM or qcow2.

Adnan Rihan said:
After my proof of concept, I'll have a budget for 2 or 3 machines, one of them could be a proxmox backup server. But for a month or 2, I'll be stuck with this single machine. Do you guys have ideas, suggestions, must have or must not do on this setup?

I would use automated snapshots and backups and use a long backup retention. For example a hourly snapshot kept for 2 days. A daily snapshot kept for 2 weeks. Weekly Backups kept for 2 months. Monthly backups kept for a year.
If a ransomware would hit again and the server management is on a dedicated management subnet, then the ransomware shouldn't be able to encrypt the server itself. If if VMs get encrypted that wouldn't be a big problem because you could rollback to a snapshot or replace the VMs with a backuped version.

If you got budget for several servers you could create a cluster and use CEPH as distributed storage.

Adnan · Feb 27, 2021

Dunuin said:
ZFS isn't a must have but it got really nice features you would miss otherwise:
- bit rot protection
- compression on block level
- deduplication
- ...

Also, filesystem level snapshot it seems

Dunuin said:
If you use ZFS you can't use LVM or qcow2.

Why isn't qcow2 usable on ZFS? Isn't qcow2 a portable file format?

EDIT: It seems that on PVE level, I simply won't be able to select qcow2 on ZFS

Dunuin said:
I would use automated snapshots and backups and use a long backup retention. For example a hourly snapshot kept for 2 days. A daily snapshot kept for 2 weeks. Weekly Backups kept for 2 months. Monthly backups kept for a year.

Ok thanks for the suggestion. Are you talking about FS level snapshots or qcow2's? Again, if I understood it correctly, successive qcow2's snapshots will depend on the first one, I can remove a backup (which is a full separated backup) but removing a snapshot would imply keeping the root one at least, doesn't it?

Dunuin said:
If a ransomware would hit again and the server management is on a dedicated management subnet, then the ransomware shouldn't be able to encrypt the server itself.

Are you referring to the proxmox server as the "management"? In our case, the issue was that every disk was mounted on other servers, as soon as I shutdown the "infected" server, the spread stopped. The only mount I'll keep enabled will be the NAS (for the Terminal Server's user's files), which will in turn either be snapshot/backuped itself like the other VM if it's a VM, or make his own snapshots/backups (FS snapshot and external backups) if it's a dedicated server.

Dunuin said:
If you got budget for several servers you could create a cluster and use CEPH as distributed storage.

I'm keeping Ceph in mind, but with OVH, it quickly sums up. They have a Cloud Disk Array service, 150€/mo/2TB. If I do it myself with 3 servers (1 storage, 1 active, 1 passive, I think HA starts at 3 hosts with PVE), isn't it a problem to use a storage server with VMs on it, by IP?

Thanks for your guidance

Dunuin · Feb 28, 2021

Adnan Rihan said:
Also, filesystem level snapshot it seems

Why isn't qcow2 usable on ZFS? Isn't qcow2 a portable file format?

If you use ZFS you need to stick to RAW because all virtual HDDs will use ZFS zvols.

Adnan Rihan said:
Ok, thanks for the suggestion. Are you talking about FS level snapshots or qcow2's? Again, if I understood it correctly, successive qcow2's snapshots will depend on the first one, I can remove a backup (which is a full separated backup) but removing a snapshot would imply keeping the root one at least, doesn't it?

Qcow2 supports snapshots, LVM supports snapshots and ZFS supports snapshots. I was talking abount the ZFS snapshots on FS level I'm using here. Snapshots are based on each other but because of that they don't need that much space and you can create them within 1 second even if they snapshot dozens of TBs of data (its a copy on write file system, so you don't need to write any new data, you just forbid to delete old journal entries). Real backups take forever to create so they are nice if you want to use small intervals like every 5 Minutes or once a hour.

Adnan Rihan said:
Are you referring to the proxmox server as the "management"? In our case, the issue was that every disk was mounted on other servers, as soon as I shutdown the "infected" server, the spread stopped. The only mount I'll keep enabled will be the NAS (for the Terminal Server's user's files), which will in turn either be snapshot/backuped itself like the other VM if it's a VM, or make his own snapshots/backups (FS snapshot and external backups) if it's a dedicated server.

Yes, with management I mean the BMC and Proxmox host itself. If anything you share is inside a isolated VM and you backup/snapshot these VMs regularily ransomware can't to much because it has no access to the host itself. And as long as the host is unaffected you can simply rollback a snapshot or retore a backup of a VM.

Adnan Rihan said:
I'm keeping Ceph in mind, but with OVH, it quickly sums up. They have a Cloud Disk Array service, 150€/mo/2TB. If I do it myself with 3 servers (1 storage, 1 active, 1 passive, I think HA starts at 3 hosts with PVE), isn't it a problem to use a storage server with VMs on it, by IP?

The great thing about CEPH is that you can distribute your storage over all servers. For example 3 servers and 3 copies of everything so each host will store one copy. If any 1 of the servers dies you will have no downtime and won't loose any data. It will just shift the VMs and data around.
If you only would have a single storage server and that one has a problem, all others servers wouldn't be able to operate too, because they rely on the shared storage of the failed server. So if HA and data integrity is a thing CEPH is a good option.

Adnan · Mar 2, 2021

@Dunuin Thanks for your valuable input, I made the choice to move to ZFS for the solo server I have, then I'll consider adding 1 or 2 other servers either to regularly send regular ZFS data (zfs send), or to move to CEPH. OVH has a CEPH cloud array storage, 150€/2TB/mo, but for 180€/mo, I can have 3 dedicated servers with 4x2TB. With RAIDZ-1, I can get more than 2TB distributed. I don't know if OVH's CEPH allows snapshots (it's not mentioned but maybe through the APIs).

- Allow me few questions about ZFS. OVH only allows primary, logical and LVM partitions to be created, and / and /boot must be on RAID1. I can't boot myself on proxmox ISO. Tell me if I will easily be able to create the ZFS on /var/lib/vz post install on the 4 HDD? Do I need to let OVH's system create the /var/lib/vz mount (1GB for instance) and then I will remove the local storage through proxmox, or should I not create the /var/lib/vz partition mount and created it post install via proxmox?

- Also, after some researches about ZFS (still on it), it seems that every changes are saved. Which means that, deletion after a snapshot consumes storage (by consuming, I mean literally, instead of shrinking, the HDD space is growing with the data about the deletion), is that so? If yes, does it mean that over time, the space needed for the entire infrastructure will only grow and ultimately, even if we don't consume MORE space on the VM disks, we might need to expand our storage?

- If I rely on snapshots and external .raw disks backups, in theory, no other backup strategy should be needed right? No need to backup specific internal VM files, windows images, mount a storage to backup it externally, etc...

- Let's say, I must manage everything myself and I have 2 servers. Is it preferable to rely on CEPH on both, or ZFS and send/recv states regularly? And in both cases, is it still necessary to externalise VM's backups?

Thanks again

Dunuin · Mar 2, 2021

Adnan Rihan said:
- Allow me few questions about ZFS. OVH only allows primary, logical and LVM partitions to be created, and / and /boot must be on RAID1. I can't boot myself on proxmox ISO. Tell me if I will easily be able to create the ZFS on /var/lib/vz post install on the 4 HDD? Do I need to let OVH's system create the /var/lib/vz mount (1GB for instance) and then I will remove the local storage through proxmox, or should I not create the /var/lib/vz partition mount and created it post install via proxmox?

Not sure how that is working with OVH. But if you want to be more flexible you can install a debian, partition everything as you like and install the proxmox packages ontop of it.

Adnan Rihan said:
- Also, after some researches about ZFS (still on it), it seems that every changes are saved. Which means that, deletion after a snapshot consumes storage (by consuming, I mean literally, instead of shrinking, the HDD space is growing with the data about the deletion), is that so? If yes, does it mean that over time, the space needed for the entire infrastructure will only grow and ultimately, even if we don't consume MORE space on the VM disks, we might need to expand our storage?

ZFS is copy on write. If you got a 10GB file, create a snapshot and edit 1GB of that file, then the file will be 11GB because ZFS won't replace the old data of theat file with new data, it will append the changes. So nothing is deleted as long as the snapshot exists. After you delete the snapshot ZFS will caclulate what isn't needed anymore, free up space and the file will be 10GB again.
Thats why you can snapshot TBs of data within a second, because nothing will be written at all. You just tell ZFS not to delete old journal entries.
So yes, the longer you keep the snapshots the bigger they will grow. If you got alot of file changes and keep the snapshots for quite a while they might consume a multiple of the size of your VMs/data.

Adnan Rihan said:
- If I rely on snapshots and external .raw disks backups, in theory, no other backup strategy should be needed right? No need to backup specific internal VM files, windows images, mount a storage to backup it externally, etc...

Don't forget to backup the "/etc/pve" directory. Thats where all the Proxmox configs are saved (including the config files of every VM).

Adnan Rihan said:
- Let's say, I must manage everything myself and I have 2 servers. Is it preferable to rely on CEPH on both, or ZFS and send/recv states regularly? And in both cases, is it still necessary to externalise VM's backups?

I'm no CEPH expert but all your servers need to vote a quorum and you need to make sure you don't run into a split brain situation if a servers dies. That is not working with 2 servers (except you use something like a Raspberry Pi as a QDevice).

Adnan · Mar 2, 2021

Adnan RIHAN said:
If I rely on snapshots and external .raw disks backups, in theory, no other backup strategy should be needed right? No need to backup specific internal VM files, windows images, mount a storage to backup it externally, etc...

Dunuin said:

Don't forget to backup the "/etc/pve" directory. Thats where all the Proxmox configs are saved (including the config files of every VM).

Click to expand...

So, I can settle that. No need to backup anything internal in the VMs, no need to backup VM's important files, VM's DB, etc... No need to mount an NFS share except for shared data between VMs, as everything will already be in the snapshots and backups. Is it right?

Dunuin said:
I'm no CEPH expert but all your servers need to vote a quorum and you need to make sure you don't run into a split brain situation if a servers dies. That is not working with 2 servers (except you use something like a Raspberry Pi as a QDevice).

So let's forget CEPH for now. Is it OK to have an exact replica of the first machine (the active one) on a second machine (the passive one), by regularly zfs send/recv from the active to the passive? If so, do I still need to externalise backups of VMs, or is zfs send/recv reliable enough to not need external backups anymore? Or is both even better?

I could think of both, having a replica + backups would be better, because of course, the more the better, but maybe zfs send/recv is reliable enough to save costs, or in contrary not reliable enough/too slow to rely on it for instance, or another reason... I'm waiting for my machines to be delivered, so I take advantage of this time to ask everything around.

Dunuin · Mar 2, 2021

Adnan said:
So, I can settle that. No need to backup anything internal in the VMs, no need to backup VM's important files, VM's DB, etc... No need to mount an NFS share except for shared data between VMs, as everything will already be in the snapshots and backups. Is it right?

So let's forget CEPH for now. Is it OK to have an exact replica of the first machine (the active one) on a second machine (the passive one), by regularly zfs send/recv from the active to the passive? If so, do I still need to externalise backups of VMs, or is zfs send/recv reliable enough to not need external backups anymore? Or is both even better?

I could think of both, having a replica + backups would be better, because of course, the more the better, but maybe zfs send/recv is reliable enough to save costs, or in contrary not reliable enough/too slow to rely on it for instance, or another reason... I'm waiting for my machines to be delivered, so I take advantage of this time to ask everything around.

Zfs replication should be quite reliable. And because it is replicating snapshots it is syncing data on block level from one host to another. So if one pool dies you still got a working identical pool on another machine.
But it is alway better to store a additional backup on another filesystem and storage medium.
If there is a problem on one pool this problem might be replicated to the second pool. Or if you update proxmox on both hosts and a openzfs update created a critical bug. That bug might kill both pools at the same time.

Adnan · Jun 8, 2021

Dunuin said:
If you use ZFS you need to stick to RAW because all virtual HDDs will use ZFS zvols.

Hi, so I was doing this since march. Installed on ext4. Then I saw I was able to access IPMI KVM and installed a server with Proxmox ISO on ZFS filesystem. Well now I can use any img format (even QCOW2) and it seems it's using the container's snapshot capability instead of ZFS' but I may be wrong.

Is there a huge difference in performance/efficiency to add ZFS vols on top of an XFS/EXT4 proxmox install, or installing proxmox on ZFS is better please?

Thanks for your help so far

Dunuin · Jun 8, 2021

If you only got XFS/EXT4 partitions you shoudn't be able to use ZFS. That would be only possible if you run a VM and install ZFS inside the guest or if you add other partitions and use them to create a ZFS pool.

Adnan · Jun 8, 2021

I should have better explained. My provider’s UI allows me to choose the number of disk I want to install Proxmox on, XFS or Ext4 and the RAID.

previously, I installed Proxmox on Ext4, no RAID and on 1 disk. Then using Proxmox UI I created a zfs pool with the 3 other disks. This was fine and only allows raw images on the ZFS storage. Snapshots are using the ZFS capabilities.

now, I reinstalled a server and was able to access the boot with KVM, so I chose ZFS/raidz and installed on the 4 disks I have. Now I can use any type of image for the Vm, and it seems the snapshots are now using the image’s container capabilities (as I can use and snapshot qcow2 images)

Search

Search

[SOLVED] Backup strategy relying on FS

Adnan

Renowned Member

Dunuin

Distinguished Member

Adnan

Renowned Member

Dunuin

Distinguished Member

Adnan

Renowned Member

Dunuin

Distinguished Member

Adnan

Renowned Member

Dunuin

Distinguished Member

Adnan

Renowned Member

Dunuin

Distinguished Member

Adnan

Renowned Member