What do you think about ZFS spare disks?

tcabernoch

Active Member
Apr 27, 2024
257
53
28
Portland, OR
www.gnetsys.net
I have to get on an airplane in order to visit my servers. Or use remote hands. You know how that can go.
So, I'd rather not _need_ somebody visiting my servers. Or at least as little as possible.
And some of its old junk. So it dies.

Enter the spare disk conundrum.
It's easy to find ZFS 'experts' who will blithely argue exact opposite perspectives on any given aspect.
Many of them declare that a spare disk is only suitable for large arrays.
I've seen advice that you are better off getting more parity than having a spare.
I've even seen warnings about resilvering, as if that's a bad thing somehow.

This is what I'm talking about.
Code:
zpool add DATA spare /dev/gptid/whateveritisafteryougptit

So you can walk away and leave the server running for a year or two.
And it cleans up after itself.
Why is that bad?
 
Why is that bad?
You didn't tell us your vdev layout, so I can invent one myself - for demonstration only let's say:
  • you have a single mirror
  • you add a spare
  • one drive fails
  • that spare shall replace the dead drive
  • resilvering starts - and it has to read successfully the complete data on the remaining single drive
  • if resilver succeeds you are back to normal; if not: you've lost some data
On the other hand you could
  • start with a triple mirror from the beginning
  • one drive fails
  • you still have redundant intact data without the urgent need to resilver
  • and when you replace the third mirror the resilver process can read two discs to reconstruct the data
In the above example a "spare" is just stupid.

Another example:
  • you have a pool with two RaidZ1 vdevs with 4 drives each --> 8 drives active, single redundancy per vdev
  • if you can only attach one more drive, and only with this being a fact, you could add a shared spare
  • one drive fails
  • resilvering that vdev starts - and all three surviving drives of that vdev have to be read successfully to avoid data loss
In this case I would really fight hard for a 10th drive to create two RaidZ2 instead of this fragile approach. (Or opt for a different topology.)

Both are just examples of my personal understanding
 
For robustness, use a three-way mirror with at least a HDD and a NVMe or SSD on another controller. Add as many (hot) spares as you want (for each type). Remember to change the name of the network device(s) because they might otherwise change when a NVMe (or other PCI(e) device) fails and disappears from the bus.
Or don't do any redundancy per Proxmox host and just create a cluster of many hosts (4 or more). Then other hardware pieces can also fail without interrupting your remote setup (or putting it in danger as soon as some one thing fails). I feel that your focus on ZFS spares is much to narrow for your redudancy requirements.
 
  • Like
Reactions: tcabernoch
A spare disk starts to make some sense if you have at least 6 disks in a raidz2. Just be aware that the hotspare does not automatically become a permanent replacement (at least, not without intervention.) The zed daemon rotates the hotspare in and out, and IIRC the pool will still show as DEGRADED if the hotspare is still in use.

Once you replace the bad drive, the hotspare goes back into "standby" mode for the next failure.

https://www.reddit.com/r/zfs/comments/rkym6s/hot_spare_to_become_a_permanent_replacement/

https://www.reddit.com/r/zfs/comments/p114ty/the_fine_manual_is_confusing_zpool_replace_and/
 
  • Like
Reactions: tcabernoch
These are really fantastic responses, thanks all.

Some sample layouts:
- 4 disk machine - Yeah, there's no room for a spare in this array.
- 6 disks - I agree, this is the very start of the range where a spare makes any sense at all. And maybe not a lot of sense.
- 12 disk array of 4 x 3 disk raidz1 vdevs - Buncha rando disks, each vdev is different capacity. Ya, I put a spare on that. (Of the largest capacity, of course.)(I inherited this junk stack. Scold me for it, but don't blame me. :] )

I still don't see a problem with resilvering. Ya, things suck while that's happening. Known issue.

Where I'm actively considering how to go forward is the oddball machine where I've got extra bays or an unused disk. (Perhaps a was bad disk replaced after the machine was first rebuilt. Sometimes that can take a while with remote hands.)
In these cases, I face a choice of expanding capacity or ... in my eyes ... giving myself the ability to ignore this particular machine for longer.
There's also the additional point that if one disk went out, another may be pending, and a spare might get used quickly.

Again, that was solid feedback. Much appreciated.
I guess my takeaway is I need to carefully consider vdev structure/raid level as I make any decisions about spares.
 
Last edited:
  • Like
Reactions: UdoB

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!