ZFS advice ?

toxic

Active Member
Aug 1, 2020
56
6
28
37
Hi all.
I'm looking for help as I'm new to zfs.
I have a 20TB HDD that I will reformat with zfs and aim at storing some VM disks and backups as well as my data that is nicely split in directories like video, photo, music, documents, docker-volumes... Right now all are under /mnt/hdd20T/data/(vidéo...)
I have a privileged LXC with a bind-mount of /mnt/hdd20T that shares everything using NFS and SMB, others LXC also bind-mount some of it for my Plex/Jellyfish for example...
I do this because I want to have USB backups. Of course my USB HDDs are smaller than 20TB but overall I have enough space.
I dislike raid cause I do not need HA and all of my hardware is commodity with less than perfect reliability even if quite good enough for my use as homelab and NAS.
I want my backups to be self-contained, meaning if I take one drive and plug it into a linux machine or even windows with openzfsforwindows I can quickly access the single file I urgently need from my backups.
Today I do this with rsync and have several targets is or NFS with daily or weekly backups.

I'm considering creating a raid0 zpool on the 20TB and then one dataset per folder (video,photo,...) and use the snapshots and clone features to replace my rsync scripts when the target of the backup is also zfs.

Being new to zfs I wanted to check here that this is a good idea and if there was any advice on how to best do this.

I will not be unplugging most of the USB regularly but some of them I would have liked to be offline, I have a few ZigBee plugs and automations possible to power the drives up or down, but putting these USB drives in zfs I believe I would need to trigger some zfs commands whenever I plug the USB drives in, not only mount the dataset but start by importing the pool. Given the only auto mount function I have found for pve8 is a systemd script on GitHub claiming not to work with zfs (https://github.com/theyo-tester/automount-pve) and that messes with the dependencies of zfs-import-cache and zfs-import-scan... I think I have quite a bit of work and learning to do, so any advice is welcome at this stage.

My main concern right now is to ensure that with a single zpool I can create as many datasets without any space constraints and let them grow as they need all shearing the 20tb
 
> I dislike raid cause I do not need HA and all of my hardware is commodity with less than perfect reliability even if quite good enough for my use as homelab and NAS

Seriously consider mirroring that 20TB drive, unless you want to recreate it all and burn a weekend restoring from backups when that single point of failure eventually falls over. A mirror is convenient insurance for uptime and you will get self-healing scrubs and (likely) better I/O.

> I want my backups to be self-contained, meaning if I take one drive and plug it into a linux machine or even windows with openzfsforwindows I can quickly access the single file I urgently need from my backups

Do not rely on "openzfsforwindows". ZFS was never designed to run on Windows - and it's alpha-level software at best. If you don't want to risk data loss, only mount your zfs pools on well-supported OS (Linux, Freebsd, OSX/MacOS, possibly Solaris but I'm not sure of the compatibility anymore there. Ironic.)

Create a bootable proxmox or Debian USB with ZFS and some handy utilities as a recovery environment, or you could use the pve installer ISO for this to a certain extent. You'll thank me later.

> Today I do this with rsync

Look into rclone. With 20TB you want parallel rsyncs. You could also look into syncoid/sanoid -- ZFS send/recv is better for sending less data but it's more complicated and not as granular.

> putting these USB drives in zfs I believe I would need to trigger some zfs commands whenever I plug the USB drives in, not only mount the dataset but start by importing the pool

You -can- do this automated, but you're the sysadmin. As root you can manually import zpools at the commandline when needed.

zpool import -a -f -d /dev/disk/by-id # or by-path if by-id is not sufficient

Just don't forget to export the pool before you disconnect/turn off the USB disk
 
  • Like
Reactions: justinclift
I am wrapping my mind around similar issues. https://github.com/Romaq/bigrig-scripts/blob/main/ORGANIZATION.md is my plan on how to think about the problem using a TurnkeyLinux appliance to manage it. My current plan is to burn "critical backups" on my Windows client after using SMB to move them to the machine I'll burn from, and otherwise rotate out backups on the SATA PVE host. I do not have the means to purchase a second SSD to mirror zfs the PVE root drive, although I do wish that was a part of the hardware. But that's why I'm making notes on the "ISO/ LXC Template Installs" on devices, as it's easier to do a "backup" from a "toaster" by simply going out and downloading a new "toaster", using a copy of my settings, than fussing over multiple backups of the same "toaster."

I'm actually here to pose a question on the best way to simulate a failure in the SATA Raidz array to "prove" it works as currently set up before I start shoving a bunch of data onto it and having it not survive a loss.

The "zpool import/ export" feature is golden. Part of my plan is to "export" the pool, bring it over to a VMWare Ubuntu on my Windows host, and prove I can "zpool import" the drive to read it, should I "really have to." And, of course, plan on NEVER having to since the filesystem SMB appliance will "just work", right? ;-)

But yeah, it sounds like @toxic and I are working off the same non-enterprise home-lab use case.
 
I'm considering creating a raid0 zpool on the 20TB and then one dataset per folder (video,photo,...) and use the snapshots and clone features to replace my rsync scripts when the target of the backup is also zfs.

Being new to zfs I wanted to check here that this is a good idea and if there was any advice on how to best do this.
a bare stripe is always a dicey proposition, be it using raid, mdadm, or zfs. you take your odds of failure and square them (since each drive has a change at fault, but either drive failure takes the whole pool down.) Do this only if there is absolutely no consequence to losing the entire dataset- and even then, do you value your time?
but putting these USB drives in zfs
no. just no. do not put usb drives in a zfs pool.

Honestly, for you use case, I wouldn't change anything from the way you are doing it. if you want to add another drive, just add another drive.
 
I'm actually here to pose a question on the best way to simulate a failure in the SATA Raidz array to "prove" it works as currently set up before I start shoving a bunch of data onto it and having it not survive a loss.

The best way that I know of to test a ZFS failed-drive scenario, outside of using a VM, is to zpool offline the drive, issue an hdparm -y command to spin it down, and physically remove it. This will of course leave you at risk if a 2nd disk happens to fail, which is why we typically recommend a minimum of RAIDZ2 protection.

After jacking out the "failed" drive, write some data to the pool - say 20-200MB of random data, but you can do more.

Then shutdown, put the drive back in, and reboot.

The pool should re-import with the drive still in an OFFLINE state. Issue a zpool online poolname drivename , the drive should resilver, and zpool status -v may give you a warning about a CKSUM error and advise a zpool clear.

At this point you can run a zpool scrub and verify the pool condition.
 
  • Like
Reactions: Romaq
Thank you. I had to step away to attend to something, then I came back to form the question on a new forum post. But this will do fine, so I'll link your post in my notes. The overall goal is to "prove" ZFS will keep working and recover if a failed drive is replaced. And yes, my intention is to do this while I don't care about anything actually on the SATA array. Once I prove zfs works as advertised, it's golden. :)
 
  • Like
Reactions: Kingneutron
As a data point, if you're replacing one of the ZFS drives that's set up for booting from then there's a little bit extra work needed to restore the boot pieces as well:

https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_change_failed_dev

It's pretty straight forward, though manual. Am kind of surprised it hasn't been fully automated (so far), though it might be due to having corner cases I'm not aware of yet.
 
Last edited:
As a data point, if you're replacing one of the ZFS drives that's set up for booting from...
Thank you. When the hardware was originally acquired, it was not known by the purchasing agent that two duplicate SDDs could be placed inside the https://store.minisforum.com/ machine I am using. When I commented on this, the expectation was, "When the SSD fails, we will purchase two SSDs and use ZFS Mirror on boot. Meanwhile, RAIDz SATA for near term backup, then off-machine if not off-site the critical data." So while I happen to have zfs on the root for the non-RAID benefits such as datasets, I can't test it.

But I would love to learn "hardware failure conformance testing" is a part of Proxmox. I'd much rather simulate a drive failure and figure out what I do about it than *have* a drive failure and have to figure it out while the PVE machine is on fire.

And thanks again to @Kingneutron, I'm giving credit and mention of the response on https://github.com/Romaq/bigrig-scripts/blob/main/HARDWARE-TEST.md which will be of use to a personal friend possibly duplicating relevant parts of this, and hopefully of use to others considering the "Proxmox SOHO" use case.
 
  • Like
Reactions: Kingneutron
> I'd much rather simulate a drive failure and figure out what I do about it than *have* a drive failure and have to figure it out while the PVE machine is on fire.

You can do reasonable practice of that in VMs. eg create a VM, assign it a few virtual volumes, install Proxmox in the VM, then creatively muck around with the disks (remove some, reattach them, write random data to some, etc).

It's a good way of getting the hang of what bits work well, what bits work less well, and so on. :)
 
Looking at the spec page of the UM790 Pro, yeah, that mentions it can have 2x NVMe drives in it.

Well the good news there is that if you installed Proxmox on the computer as a single ZFS (!) volume, you can convert it to being a mirror (from the cli) without much hassle.

You'll need to mirror the partition tables from the first drive (the first two sgdisk commands here), then use ZFS commands to add the remainder of the 2nd drive (probably partition 3) to be a mirror of the first drive (partition 3).

It's all do-able online without any down time either. In theory. :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!