VM shows io-error and locks up

Sandbo · Jun 2, 2021

I am setting up one VM to do plotting for Chia, it has the following disks:
-3 pass-through 1TB SSDs
-5 SATA 400GB SSDs as RAID0 in Proxmox, used as a directory and is assigned as virtual hard disk to the VM (volume set was 1800 GiB, after checking df -BGiB in Proxmox shell)
-1 PCI-E RAID card (LSI 9266-8i)

After some times, can be a day or some times as short as a couple hours, the VM becomes unresponsive to SSH. By logging in to the GUI, the console simply says "Guest disabled display", and there is a yellow "!" saying "io-error". The OS is completely frozen and I can only reboot it, and it does work after the reboot but can freeze later. I checked and when it locks up, the SATA arrays are at most at 80% and are not yet full.

May I know where to start to look into the issue? As I have a few IO devices, can I see which one of the above three is the cause?

Update 1: Adding some context after some tests. Turns out it was caused by the SATA drives as I isolate them one by one.
They are 5*400GB SATA SSDs (Intel DC S3700 400 GB) attached to the mainboard directly (X399 Taichi). I set them up as ZFS RAID 0 and created a directory under /CSCP2. I checked by "df -BGiB" that the volume is (for example, not when it had the error)

CSCP2 1802GiB 343GiB 1459GiB 20% /CSCP2

So I assigned to VM as a harddisk with 1800 GiB. As the drive is being filled, and before it has been fully filled, it threw the io-error at some point, seemingly up to 80% use of the drive as indicated from df from host.

Stefan_R · Jun 10, 2021

Are there any errors on the host? (i.e. run 'journalctl -e' from the time when the IO error starts happening)
Also, since you're using ZFS, does ZFS itself report any errors on your zpool? (i.e. 'zpool status', 'zfs list')

Sandbo · Jun 10, 2021

Stefan_R said:
Are there any errors on the host? (i.e. run 'journalctl -e' from the time when the IO error starts happening) Also, since you're using ZFS, does ZFS itself report any errors on your zpool? (i.e. 'zpool status', 'zfs list')

Thanks for the follow-up, I believe it was because the drives were being filled up due to write amplification.
I further cut down the assigned size and it never happened again later; though now I have just passed the drives directly to the VM.

Search

Search

VM shows io-error and locks up

Sandbo

Well-Known Member

Stefan_R

Proxmox Retired Staff

Sandbo

Well-Known Member