VM shows io-error and locks up

Sandbo

Member
Jul 4, 2019
65
6
13
32
I am setting up one VM to do plotting for Chia, it has the following disks:
-3 pass-through 1TB SSDs
-5 SATA 400GB SSDs as RAID0 in Proxmox, used as a directory and is assigned as virtual hard disk to the VM (volume set was 1800 GiB, after checking df -BGiB in Proxmox shell)
-1 PCI-E RAID card (LSI 9266-8i)

After some times, can be a day or some times as short as a couple hours, the VM becomes unresponsive to SSH. By logging in to the GUI, the console simply says "Guest disabled display", and there is a yellow "!" saying "io-error". The OS is completely frozen and I can only reboot it, and it does work after the reboot but can freeze later. I checked and when it locks up, the SATA arrays are at most at 80% and are not yet full.

May I know where to start to look into the issue? As I have a few IO devices, can I see which one of the above three is the cause?

Update 1: Adding some context after some tests. Turns out it was caused by the SATA drives as I isolate them one by one.
They are 5*400GB SATA SSDs (Intel DC S3700 400 GB) attached to the mainboard directly (X399 Taichi). I set them up as ZFS RAID 0 and created a directory under /CSCP2. I checked by "df -BGiB" that the volume is (for example, not when it had the error)

CSCP2 1802GiB 343GiB 1459GiB 20% /CSCP2

So I assigned to VM as a harddisk with 1800 GiB. As the drive is being filled, and before it has been fully filled, it threw the io-error at some point, seemingly up to 80% use of the drive as indicated from df from host.
 
Last edited:

Stefan_R

Proxmox Staff Member
Staff member
Jun 4, 2019
1,300
276
88
Vienna
Are there any errors on the host? (i.e. run 'journalctl -e' from the time when the IO error starts happening)
Also, since you're using ZFS, does ZFS itself report any errors on your zpool? (i.e. 'zpool status', 'zfs list')
 

Sandbo

Member
Jul 4, 2019
65
6
13
32
Are there any errors on the host? (i.e. run 'journalctl -e' from the time when the IO error starts happening) Also, since you're using ZFS, does ZFS itself report any errors on your zpool? (i.e. 'zpool status', 'zfs list')

Thanks for the follow-up, I believe it was because the drives were being filled up due to write amplification.
I further cut down the assigned size and it never happened again later; though now I have just passed the drives directly to the VM.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!