Stop the hang of a server when ZFS has a fit

sabrtooth

New Member
Feb 6, 2024
4
0
1
I've been doing some testing on disaster scenarios with ZFS/Proxmox.

I've noticed that across multiple hosts, if a VM has a storage qcow on a ZFS with a bad disk (I have a disk for testing that passes all tests but generates occasional read errors) the whole system will hang without a recovery in many cases. I have to reboot.

For example, in the situation, I can log in at console, but a prompt never shows after the MOTD.

Please understand: this is a testing environment. This is a long established working server with established and recently tested good base hardware that I am injecting a known problem to.

This PVE in normal operations (normal operations established by long periods of time between 24 and 1 hour of solid availability between tests):

Average load: 2.09 Of a Xeon D-1518 2.2GHz processor

Average RAM: 11.39GB of 128GB of RAM.

Two VMs (1 windows, 1 Ubuntu)
- Each have 60gb single qcows.
- One on the internal SSD zpool. One on the internal HDD zpool.

When I add a external sata (Motherboard supports multiport) container with 4 drives (one bad) and make a zpool and let it sit, it runs indefinitely and ram does increase reflectively to the size (I've tried different sizes)

If I backup to it, attach it to either vms and copy files to it:

The system load jumps to 70 after hitting a delayed read error and that's the end of the log. The whole system hangs. If I try to log in at console, I don't even get a prompt. All VMS freeze and all IOs seem to halt until a reboot.

I can manage this, but I've not seen this issue before and is likely related to how the drive is failing, but still is this expected behavior.

OH-- Also, I tested this morning on another PVE Node and the problem followed the drive.


Is there any recommended configuration to stave off this situation? I understand why it may be happening, but not sure if it could be mitigated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!