Stop the hang of a server when ZFS has a fit

sabrtooth · Feb 28, 2024

I've been doing some testing on disaster scenarios with ZFS/Proxmox.

I've noticed that across multiple hosts, if a VM has a storage qcow on a ZFS with a bad disk (I have a disk for testing that passes all tests but generates occasional read errors) the whole system will hang without a recovery in many cases. I have to reboot.

For example, in the situation, I can log in at console, but a prompt never shows after the MOTD.

Please understand: this is a testing environment. This is a long established working server with established and recently tested good base hardware that I am injecting a known problem to.

This PVE in normal operations (normal operations established by long periods of time between 24 and 1 hour of solid availability between tests):

Average load: 2.09 Of a Xeon D-1518 2.2GHz processor

Average RAM: 11.39GB of 128GB of RAM.

Two VMs (1 windows, 1 Ubuntu)
- Each have 60gb single qcows.
- One on the internal SSD zpool. One on the internal HDD zpool.

When I add a external sata (Motherboard supports multiport) container with 4 drives (one bad) and make a zpool and let it sit, it runs indefinitely and ram does increase reflectively to the size (I've tried different sizes)

If I backup to it, attach it to either vms and copy files to it:

The system load jumps to 70 after hitting a delayed read error and that's the end of the log. The whole system hangs. If I try to log in at console, I don't even get a prompt. All VMS freeze and all IOs seem to halt until a reboot.

I can manage this, but I've not seen this issue before and is likely related to how the drive is failing, but still is this expected behavior.

OH-- Also, I tested this morning on another PVE Node and the problem followed the drive.

Is there any recommended configuration to stave off this situation? I understand why it may be happening, but not sure if it could be mitigated.

Search

Search

Stop the hang of a server when ZFS has a fit

sabrtooth

New Member

We value your privacy