I've been running Proxmox about 2 years now and I'm always struggling with high IO issues. Due to conficting information online I've tried a ton of different settings and although it has been better then it was it's still not great. So I thought I would write this post and ask for some help.
I'm running Proxmox on an old Dell R610 with 2 Xeon E5649's, 56Gb registered ECC RAM and Proxmox is installed on 2 Samsung 1Tb QVO SSD's running in RAID 1 behind a Hardware RAID card. The VMs are also running on these SSDs. The system is using ZFS as FS and I added a 1 Tb PCIE caching ssd a year ago that helped a bit with the High IO. The caching ssd is partitioned in 200Gb to use as Log disk for ZFS and 800 Gb to use as Cache disk for ZFS.
All VMs are using QCOW2 images with a VIRTIO SCSI single and caching enabled. Most VMs are using either writethrough or writeback caching with a few using directsync because this was supposed to be safer, and I don't want to risk losing data on for example my email server. And all VMs have IO thread and SSD emulation enabled.
I've attached an image with the stats while building an application in my CI/CD (Jenkins/Sonarqube). The IO delay is usually around the 3-6% but can spike up to 8% on a daily basis and when running an actuall workload like the CI/CD it spikes up to 20-30% and when booting a few VMs at the same time it can totally freeze the system and spike upto 70-80%.
Any help or feedback would be helpfull since I'm no expert on this. Thanks in advance.
I'm running Proxmox on an old Dell R610 with 2 Xeon E5649's, 56Gb registered ECC RAM and Proxmox is installed on 2 Samsung 1Tb QVO SSD's running in RAID 1 behind a Hardware RAID card. The VMs are also running on these SSDs. The system is using ZFS as FS and I added a 1 Tb PCIE caching ssd a year ago that helped a bit with the High IO. The caching ssd is partitioned in 200Gb to use as Log disk for ZFS and 800 Gb to use as Cache disk for ZFS.
All VMs are using QCOW2 images with a VIRTIO SCSI single and caching enabled. Most VMs are using either writethrough or writeback caching with a few using directsync because this was supposed to be safer, and I don't want to risk losing data on for example my email server. And all VMs have IO thread and SSD emulation enabled.
I've attached an image with the stats while building an application in my CI/CD (Jenkins/Sonarqube). The IO delay is usually around the 3-6% but can spike up to 8% on a daily basis and when running an actuall workload like the CI/CD it spikes up to 20-30% and when booting a few VMs at the same time it can totally freeze the system and spike upto 70-80%.
Any help or feedback would be helpfull since I'm no expert on this. Thanks in advance.