Dear community,
I want to use healthchecks (see link) to monitor the ZFS scrub / trim operations on my Proxmox server, and I would be very glad if you can guide me on the right way to do it.
I find healthchecks very useful, as it allows to monitor not only the success/failure of an operation, but also if it was not launched on time, or takes more time than expected.
And I believe my questions would be also be applicable to any kind of API hooks, as it relies on simple HTTP GET calls.
I already implemented such integration on my workstation running on Arch, using systemd, and it works like a charm; however Debian seems to have a different approach.
Here is how it works on Arch (maybe the "most generic" way), see link:
To integrate calls to my healtchecks server, here is how I did:
On Debian, however, scrub and trim are scheduled differently, see link:
Here are the 2 possible approaches, both implying to replace the cron job by systemd:
Approach #1: the Debian way
What would be your recommendation? As far as I can see, only approach #2 is relevant. Approach #1 would be okay, only in the case you have only one pool.
Any other way to do it, maybe scripting?
Thanks for taking the time to read this long post (sorry for that!), if someone is interested in my implementation details, do not hesitate to tell, I will be happy to share!
I want to use healthchecks (see link) to monitor the ZFS scrub / trim operations on my Proxmox server, and I would be very glad if you can guide me on the right way to do it.
I find healthchecks very useful, as it allows to monitor not only the success/failure of an operation, but also if it was not launched on time, or takes more time than expected.
And I believe my questions would be also be applicable to any kind of API hooks, as it relies on simple HTTP GET calls.
I already implemented such integration on my workstation running on Arch, using systemd, and it works like a charm; however Debian seems to have a different approach.
Here is how it works on Arch (maybe the "most generic" way), see link:
- enable / start the scrub / trim template .timer units for each ZFS pool, for example
systemctl enable zfs-scrub@mypool.timer && systemctl start zfs-scrub@mypool.timer
(note the mypool instance argument) - at scheduled time, the timer launches the respective .service unit, the same way you would
systemctl start zfs-scrub@mypool.service
To integrate calls to my healtchecks server, here is how I did:
- copy / create the scrub / trim .service units in
/etc/systemd/system
to add the Wants, OnSuccess and OnFailure hooks, in order to trigger my customzfs-healthchecks@.service
- the trick is to use a complex instance argument, typically
zfs-healthchecks@<pool_name>:(scrub|trim):(start|success|fail).service
- the trick is to use a complex instance argument, typically
- my custom
zfs-healthchecks@.service
just parses the systemd argument, and will issue acurl
command to the relevant URL, depending on the status
On Debian, however, scrub and trim are scheduled differently, see link:
- a cron job exists at
/etc/cron.d/zfsutils-linux
, calling/usr/lib/zfs-linux/trim
and/usr/lib/zfs-linux/scrub
- both do some basic checks, and call
zfs trim
andzfs scrub
for all pools in a loop (that is the important point) - note the use of the specific
org.debian:periodic-trim
andorg.debian:periodic-scrub
ZFS pool properties, to allow disabling an operation on a given pool - fun fact: Debian is still shipped with the same
/lib/systemd/system/zfs-scrub@.service
as on Arch, which is probably inherited from the standard ZFS on Linux packaging, but it's not used and here contradicts the Debian approach I think...
- both do some basic checks, and call
Here are the 2 possible approaches, both implying to replace the cron job by systemd:
Approach #1: the Debian way
- create some basic .timer and .service units for both trim and scrub, for example
zfs-scrub-all.timer
andzfs-scrub-all.service
, the latter just calling/usr/lib/zfs-linux/scrub
with the Wants, OnSuccess and OnFailure hooks - pros: the simpler way, compatible with the Debian solution of custom ZFS pool property
- cons:
- will simply not work with multiple pools, as you can define only one healthchecks ping URL (remember, it will launch all trims/scrubs in a loop)
- because of that, you will not be able to tell which pool have started or failed, and basically it makes healthchecks useless
- will simply not work with multiple pools, as you can define only one healthchecks ping URL (remember, it will launch all trims/scrubs in a loop)
- pros:
- very flexible, you can monitor each pool and operation separately, defining expected duration for each
- you can even create specific timers for each pool, if you wish to run scrubs on different days for example
- cons:
- you have to once enable the timers for each pool explicitly, but not very constraining, unless you have many pools or the pool list changes often
- doesn't use the custom Debian ZFS pool property, but not a problem I think, as you define explicitly which operations on which pool anyway, which is similar
What would be your recommendation? As far as I can see, only approach #2 is relevant. Approach #1 would be okay, only in the case you have only one pool.
Any other way to do it, maybe scripting?
Thanks for taking the time to read this long post (sorry for that!), if someone is interested in my implementation details, do not hesitate to tell, I will be happy to share!