[SOLVED] ZFS: Unable to remove failed drive.

alex591

Member
Apr 20, 2021
9
1
8
44
Hi,

I've recently encountered a failing drive in a ZFS pool. The pool is striped, with no redundancy. Currently, the only affected files appear to be ones that can be easily be replaced, and I'm hoping to keep it that way to avoid having to take the whole server offline to restore from backups. I tried to remove the failing drive from the pool using zpool remove (there's plenty of space on other drives for the contents), but after a few hours, I noticed in zpool status:

Code:
remove: Removal of /dev/disk/by-partlabel/AAAE-03 canceled on Mon Dec 11 20:05:50 2023
        364K memory used for removed device mappings
...
        NAME          STATE     READ WRITE CKSUM
          AAAE-03     DEGRADED   724     3   641  too many errors

I didn't cancel it myself, but I have noticed the number of read errors has increased since starting the removal. Is it the read errors that likely caused the removal to be cancelled?

How should I proceed to remove this drive with the least possible disruption? I should mention that I don't plan to replace it any time soon, as there's still plenty of space on the remaining drives for the forseeable future.

Thanks in advance!

Alex
 
It's probably because the drive cannot be safely remove from the stripe properly because of all the errors. I think you need to (find a way to) delete it from the stripe in a forced way. I don't know if this is possible. Maybe you physically remove the drive first, so ZFS known that reading data from it is impossible?
 
Thanks for the suggestion!

The man page for zpool-remove does indeed say that I/O errors will cause the removal to stop, and there doesn't seem to be a way to force it.

Given that the pool is not redundant, I have no idea what effect physically removing the drive would have, but I'm sure it wouldn't be good! At the very least I would lose all the contents of that drive.

In the end, I got a list of affected files (zpool status -v), and manually deleted them. Ran the remove again, got some more I/O errors. Deleted those files, ran the remove again, and repeat, until eventually it succeeded.

So, problem solved for now, but it was a very long and tedious process. I can't help but feel there is (or should be) a better way.
 
So, problem solved for now, but it was a very long and tedious process. I can't help but feel there is (or should be) a better way.

The better way would be to have redundant storage.

If you don't want redundant storage, then you need to do what you did and risk losing data. There probably is a way to script things, but I am not sure anybody has spent a lot of effort automating things, as there is a very high potential of a script running wild and helpfully deleting all your data for you
 
All good points. It just seems odd to me that there doesn't seem to be a 'proper' way to remove a known failing drive under these circumstances - a 'force' option that would just zero-fill any unreadable blocks when copying, for example.

Unfortunately as much as I want redundant storage, it's just not financially viable at the moment.

I'm still counting this as a win for ZFS - at least it alerted me to the scope of the corruption so that I could chose how best to handle it. No irreplacable data lost, no downtime, and a learning experience. I'll mark this thread as solved.
 
I see your point. Absolutely.

Among typical users of ZFS, you won't find a lot of people who can't afford $100-$300 for an extra drive. And that's likely the reason why nobody is willing to write this type of tool. The engineering effort for file system code is non-trivial, and only makes sense if there are enough users and if the code gets exercised regularly.

But you could rephrase the question, and then it would in fact have lots of users. Unfortunately, I don't believe this is currently in scope for ZFS.

Let's daydream a little. I assume you can't afford another drive that has the same performance as your existing drives, and that's why you can't afford to set up redundancy. But hypothetically you probably could afford adding an old and slow laptop drive, or even just a slow USB memory stick. Those can probably be had for $20. But if you managed to add one of these devices to your storage pool, you would completely tank performance of your disk array. And you'd probably also wear out the memory stick in no time, as ZFS is infamous for its write amplification.

But if ZFS instead had good support for tiered storage, this would all be different. Imagine a scenario, where ZFS would first write mirrored data to all your existing fast disks, and only if data proved to be long-lived and static, would it then migrate data to a more efficient format that used the USB stick for parity. Think of it as an automated way of moving older data into a more cold-storage friendly format. There certainly would be lots of users who'd appreciate this feature and would use it with slow drives (e.g. HAMR or shingled disks) in combination with a smaller pool of fast SSD.

As far as I can tell, tiered storage isn't on the roadmap for ZFS though. I am cautiously optimistic the bcachefs will get there before ZFS does. But it's way too early to use in production environments.