monitoring ZFS scrub / trim with healthchecks, using systemd

bpak · May 21, 2023

Dear community,

I want to use healthchecks (see link) to monitor the ZFS scrub / trim operations on my Proxmox server, and I would be very glad if you can guide me on the right way to do it.
I find healthchecks very useful, as it allows to monitor not only the success/failure of an operation, but also if it was not launched on time, or takes more time than expected.
And I believe my questions would be also be applicable to any kind of API hooks, as it relies on simple HTTP GET calls.

I already implemented such integration on my workstation running on Arch, using systemd, and it works like a charm; however Debian seems to have a different approach.

Here is how it works on Arch (maybe the "most generic" way), see link:

enable / start the scrub / trim template .timer units for each ZFS pool, for example systemctl enable zfs-scrub@mypool.timer && systemctl start zfs-scrub@mypool.timer (note the mypool instance argument)
at scheduled time, the timer launches the respective .service unit, the same way you would systemctl start zfs-scrub@mypool.service

To integrate calls to my healtchecks server, here is how I did:

copy / create the scrub / trim .service units in /etc/systemd/system to add the Wants, OnSuccess and OnFailure hooks, in order to trigger my custom zfs-healthchecks@.service
- the trick is to use a complex instance argument, typically zfs-healthchecks@<pool_name>:(scrub|trim):(start|success|fail).service
my custom zfs-healthchecks@.service just parses the systemd argument, and will issue a curl command to the relevant URL, depending on the status

On Debian, however, scrub and trim are scheduled differently, see link:

a cron job exists at /etc/cron.d/zfsutils-linux, calling /usr/lib/zfs-linux/trim and /usr/lib/zfs-linux/scrub
- both do some basic checks, and call zfs trim and zfs scrub for all pools in a loop (that is the important point)
- note the use of the specific org.debian:periodic-trim and org.debian:periodic-scrub ZFS pool properties, to allow disabling an operation on a given pool
- fun fact: Debian is still shipped with the same /lib/systemd/system/zfs-scrub@.service as on Arch, which is probably inherited from the standard ZFS on Linux packaging, but it's not used and here contradicts the Debian approach I think...

Here are the 2 possible approaches, both implying to replace the cron job by systemd:
Approach #1: the Debian way

create some basic .timer and .service units for both trim and scrub, for example zfs-scrub-all.timer and zfs-scrub-all.service, the latter just calling /usr/lib/zfs-linux/scrub with the Wants, OnSuccess and OnFailure hooks
pros: the simpler way, compatible with the Debian solution of custom ZFS pool property
cons:
- will simply not work with multiple pools, as you can define only one healthchecks ping URL (remember, it will launch all trims/scrubs in a loop)
  - because of that, you will not be able to tell which pool have started or failed, and basically it makes healthchecks useless

Approach #2: the one I did on Arch

pros:
- very flexible, you can monitor each pool and operation separately, defining expected duration for each
- you can even create specific timers for each pool, if you wish to run scrubs on different days for example
cons:
- you have to once enable the timers for each pool explicitly, but not very constraining, unless you have many pools or the pool list changes often
- doesn't use the custom Debian ZFS pool property, but not a problem I think, as you define explicitly which operations on which pool anyway, which is similar

What would be your recommendation? As far as I can see, only approach #2 is relevant. Approach #1 would be okay, only in the case you have only one pool.
Any other way to do it, maybe scripting?

Thanks for taking the time to read this long post (sorry for that!), if someone is interested in my implementation details, do not hesitate to tell, I will be happy to share!

LnxBil · May 21, 2023

First, what's wrong with the current system that is already implemented? Are you missing something? It's totally monitorable already and does its job admirably.

Some experience from years of managing/monitoring ZFS pools:

Trim is overrated and does not matter in an enterprise environment with enterprise hardware. We have SSDs running for over 10y without running trim once
Scrubbing can take days to weeks in really big pools, why would you want to be informed that it takes so long? You cannot do anything about it. It's nice to have statistics though.
You should monitor and counteract too many snapshots. With 6 and 7 figure snapshot counts, it can get really slow
Have mirrored special devices on spinning-rust pools
Have mirrored optane SLOG/ZIL
always use NL-SAS instead of SATA for the enterprise hardware, especially the guaranteed response times and monitor them

monitoring ZFS scrub / trim with healthchecks, using systemd

bpak

Active Member

LnxBil

Distinguished Member

We value your privacy