I am setting up one VM to do plotting for Chia, it has the following disks:
-3 pass-through 1TB SSDs
-5 SATA 400GB SSDs as RAID0 in Proxmox, used as a directory and is assigned as virtual hard disk to the VM (volume set was 1800 GiB, after checking df -BGiB in Proxmox shell)
-1 PCI-E RAID card (LSI 9266-8i)
After some times, can be a day or some times as short as a couple hours, the VM becomes unresponsive to SSH. By logging in to the GUI, the console simply says "Guest disabled display", and there is a yellow "!" saying "io-error". The OS is completely frozen and I can only reboot it, and it does work after the reboot but can freeze later. I checked and when it locks up, the SATA arrays are at most at 80% and are not yet full.
May I know where to start to look into the issue? As I have a few IO devices, can I see which one of the above three is the cause?
Update 1: Adding some context after some tests. Turns out it was caused by the SATA drives as I isolate them one by one.
They are 5*400GB SATA SSDs (Intel DC S3700 400 GB) attached to the mainboard directly (X399 Taichi). I set them up as ZFS RAID 0 and created a directory under /CSCP2. I checked by "df -BGiB" that the volume is (for example, not when it had the error)
So I assigned to VM as a harddisk with 1800 GiB. As the drive is being filled, and before it has been fully filled, it threw the io-error at some point, seemingly up to 80% use of the drive as indicated from df from host.
-3 pass-through 1TB SSDs
-5 SATA 400GB SSDs as RAID0 in Proxmox, used as a directory and is assigned as virtual hard disk to the VM (volume set was 1800 GiB, after checking df -BGiB in Proxmox shell)
-1 PCI-E RAID card (LSI 9266-8i)
After some times, can be a day or some times as short as a couple hours, the VM becomes unresponsive to SSH. By logging in to the GUI, the console simply says "Guest disabled display", and there is a yellow "!" saying "io-error". The OS is completely frozen and I can only reboot it, and it does work after the reboot but can freeze later. I checked and when it locks up, the SATA arrays are at most at 80% and are not yet full.
May I know where to start to look into the issue? As I have a few IO devices, can I see which one of the above three is the cause?
Update 1: Adding some context after some tests. Turns out it was caused by the SATA drives as I isolate them one by one.
They are 5*400GB SATA SSDs (Intel DC S3700 400 GB) attached to the mainboard directly (X399 Taichi). I set them up as ZFS RAID 0 and created a directory under /CSCP2. I checked by "df -BGiB" that the volume is (for example, not when it had the error)
CSCP2 1802GiB 343GiB 1459GiB 20% /CSCP2
So I assigned to VM as a harddisk with 1800 GiB. As the drive is being filled, and before it has been fully filled, it threw the io-error at some point, seemingly up to 80% use of the drive as indicated from df from host.
Last edited: