SSD setup for VM host

pos-sudo

New Member
Jun 13, 2021
27
1
3
Dear,

Currently we have some troubles with High IO delay on almost any 'heavy' write on the VM. Our Proxmox is currently installed on Crucial MX500 with ZFS RAID 1. I was searching on the web and I saw that this could happen because of these disks can't handle much IOPS for VM Hosting.

I was wondering if we create another ZFS RAID 1 with the INTEL SSD D3-S4510 Series 1.92TB and we migrate the VM hosting to that storage pool, does this affect the IO delay / wait in a positive way? So that we don't have any issues with high IO delay which causing some downtime to the websites?

Kind regards,
 
  • Like
Reactions: itatthetop

Deleted member 116138

Guest
The MX500 is a typical consumer/desktop SSD and it is not recommended to use this type of drives for productive use. You should gain better performance with the Intel drives and a longer drive lifetime.
 
  • Like
Reactions: pos-sudo

pos-sudo

New Member
Jun 13, 2021
27
1
3
The MX500 is a typical consumer/desktop SSD and it is not recommended to use this type of drives for productive use. You should gain better performance with the Intel drives and a longer drive lifetime.
Thank you for your reply, so If I understand right, In our case we can Proxmox self let be on the Crucial MX500, as long as we host the VM's on the Intel SSDs as mentioned above, so we shouldn't have a high IO delay which causes downtime if we host the VMS on these Intel Drives in a new ZFS storage pool on the server?
 

Deleted member 116138

Guest
If it is a playground you can keep the MX drives for the host. But PVE also writes around 30GB per day on it‘s disks. It‘s likely that the drives will fail sooner or later. So if you intend to use PVE as a production system with critical VMs I would consider to exchange the Crucial drives also.
 
  • Like
Reactions: pos-sudo

pos-sudo

New Member
Jun 13, 2021
27
1
3
If it is a playground you can keep the MX drives for the host. But PVE also writes around 30GB per day on it‘s disks. It‘s likely that the drives will fail sooner or later. So if you intend to use PVE as a production system with critical VMs I would consider to exchange the Crucial drives also.
Thank you for this information ;)
 

pos-sudo

New Member
Jun 13, 2021
27
1
3
If it is a playground you can keep the MX drives for the host. But PVE also writes around 30GB per day on it‘s disks. It‘s likely that the drives will fail sooner or later. So if you intend to use PVE as a production system with critical VMs I would consider to exchange the Crucial drives also.
But one more question, If we want to reduce the IO delay in order to avoid the downtime we have with high writes, and we be aware of that we have to replace the MX drives more then the intels can we host the VMS on the intels and PVE on the MX without having high IO delay?
 

Dunuin

Famous Member
Jun 30, 2020
6,723
1,563
149
Germany
Also keep in mind that the MX500 are advertised as having a "powerloss immunity". But they don't got a real "powerloss protection" like enterprise/datacenter SSDs where each SSD got its own internal backup battery (technically they use condensators, but work the same as a BBU of a raid controller). In case of an power outage or PSU failure the enterprise SSDs with "powerloss protection" will run on battery and quickly store all cached data from volatile RAM caches to the nonvolatile NAND. Where the MX500 just regularily stores the caches to NAND so in case of an power outage less data is lost.
If you don't want to loose your OS disks (ZFS mirror won't help as both SSDs loose the data at the same time so the redundancy is useless) on an power outage you should use enterprise/datacenter and not consumer grade SSDs.
 
  • Like
Reactions: pos-sudo

pos-sudo

New Member
Jun 13, 2021
27
1
3
Also keep in mind that the MX500 are advertised as having a "powerloss immunity". But they don't got a real "powerloss protection" like enterprise/datacenter SSDs where each SSD got its own internal backup battery (technically they use condensators, but work the same as a BBU of a raid controller). In case of an power outage or PSU failure the enterprise SSDs with "powerloss protection" will run on battery and quickly store all cached data from volatile RAM caches to the nonvolatile NAND. Where the MX500 just regularily stores the caches to NAND so in case of an power outage less data is lost.
If you don't want to loose your OS disks (ZFS mirror won't help as both SSDs loose the data at the same time so the redundancy is useless) on an power outage you should use enterprise/datacenter and not consumer grade SSDs.
Thanks for your reply, I understand, is therefore the Intel D3 S4510 good enough for hosting VMs without high IO wait / delay and running proxmox with ZFS RAID 1?

Thanks in advance!
 

Dunuin

Famous Member
Jun 30, 2020
6,723
1,563
149
Germany
Thanks for your reply, I understand, is therefore the Intel D3 S4510 good enough for hosting VMs without high IO wait / delay and running proxmox with ZFS RAID 1?

Thanks in advance!
Depends on your workload. The S4510 are cheap ones for read intense workloads. For mixed workloads they got the S4610 series with better write performance and better durability. As far as I know they don't got a S4700/S4710 series anymore for write intense workloads like the discontinued S3710 that used MLC instead of TLC NAND (except for maybe the super expensive SLC NAND optanes).
 
Last edited:

pos-sudo

New Member
Jun 13, 2021
27
1
3
Depends on your workload. The S4510 are cheap ones for read intense workloads. For mixed workloads they got the S4610 series with better write performance and better durability. As far as I know they don't got a S4700/S4710 series anymore for write intense workloads like the discontinued S3710 (except for maybe the super expensive optanes).
Hmm I understand, the only thing we host are web / mail en MySQL server VMs, for instance plesk servers and stand alone Linux
 

Deleted member 116138

Guest
Web, Mail, MySql, Plesk, etc. - sounds not like a homelab environment. I would recommend the use of enterprise class drives with power-loss protection like @Dunuin already mentioned. SSDs for the host itself and maybe NVMe for the VM storage.
 
  • Like
Reactions: pos-sudo

pos-sudo

New Member
Jun 13, 2021
27
1
3
Web, Mail, MySql, Plesk, etc. - sounds not like a homelab environment. I would recommend the use of enterprise class drives with power-loss protection like @Dunuin already mentioned. SSDs for the host itself and maybe NVMe for the VM storage.
Thank you very much, I understand now, we will plan this in order to achieve this en wipe out the Crucial Disks. One last question and I hope you could give some advice on that, If we are set up a second server which is a HP Proliant DL360 G9 with 128GB DDR4 ECC RAM, 20 cores 2.9Ghz (40 cores with hyperthreading) 2x 240GB Intel SSD enterprise for PVE Host and a second ZFS RAID 1 (mirror) pool 1.92TB for the VM's hosting which is the Intel DC D3-S4510. Are we able to manage the IO delay to a minimum? For instance on write we have now a variable IO from 2% to 15% sometimes 50% which only 15 VM's (all of them are Linux Plesk Web environments) to like almost 0% stable?

Thank you in advance!! I'm really glad you guys help us and give some advice in this situation :)
 

LnxBil

Famous Member
Feb 21, 2015
6,292
777
163
Saarland, Germany
2x 240GB Intel SSD enterprise for PVE Host and a second ZFS RAID 1 (mirror) pool 1.92TB for the VM's
Why not just use 4 identical drives and install everything in a stripped mirror in ZFS? You have higher I/O, more space and use the same amount of sata/sas slots.
 

pos-sudo

New Member
Jun 13, 2021
27
1
3
Why not just use 4 identical drives and install everything in a stripped mirror in ZFS? You have higher I/O, more space and use the same amount of sata/sas slots.
Understandable, budget wise is this not a option for our customer in this current situation unfortunately.
 

_gabriel

Member
Mar 30, 2021
101
12
18
37
if I rely on /proc/diskstats , I obtain for PVE itself :
PVE is installed on dedicaced USB key with MLC nand with one VM :
1GB written/day Proxmox 6.2 / SWAP disabled (uptime 78 days)
1,5GB written/day Proxmox 7.2 / SWAP on another disk (uptime 13 days )
0,5GB written/day Proxmox 7.1 / SWAP 2GB 10% used (uptime 9 days )
 
  • Like
Reactions: pos-sudo

pos-sudo

New Member
Jun 13, 2021
27
1
3
We just tested on Intel D3-S4510 with a plesk environment, and if we test write with:

dd if=/dev/zero of=/root/testfile bs=1024M count=1 oflag=direct
The speed seems normal within under 2 seconds 1.1Gb copied. But in Proxmox we see a IO delay spike to 45% and then immediatly to 0% is this normal with this tests?

We need to reduce the ZFS delay, statistics learns that zvol causes high IO, is there something we can do to reduce this?

Thanks in advance!
 
Last edited:

Dunuin

Famous Member
Jun 30, 2020
6,723
1,563
149
Germany
We just tested on Intel D3-S4510 with a plesk environment, and if we test write with:

dd if=/dev/zero of=/root/testfile bs=1024M count=1 oflag=direct
That test don't make much sense. ZFS uses LZ4 compression by defult, so writing zeros will be highly compressible so you are not writing that much to the pool. Use atleast /dev/urandom instead of /dev/zero or use proper benchmark tools like fio.
 
  • Like
Reactions: pos-sudo

pos-sudo

New Member
Jun 13, 2021
27
1
3
That test don't make much sense. ZFS uses LZ4 compression by defult, so writing zeros will be highly compressible so you are not writing that much to the pool. Use atleast /dev/urandom instead of /dev/zero or use proper benchmark tools like fio.
Just tested, same result unfortunately. I searched for a while on the web and so far there are much more people having issues with Proxmox and ZFS IO Delay, so far nobody has a workaround to the high IO delay on a ZFS pool even with DC Intel SSD's?

Edit: We see that the IO delay only occurs when zfs runs the zfs z_wr_int, does this happen because of the MX500 disks, in other words if we change this with intels they can handle this with no problem?


Thanks in advance!
 
Last edited:

LnxBil

Famous Member
Feb 21, 2015
6,292
777
163
Saarland, Germany
Understandable, budget wise is this not a option for our customer in this current situation unfortunately.
What? Using two pools is a waste of money. There is no benefit in using two disks for the OS and two disks for the VMs besides having PVE and the VMs separate in case of a reinstall and that depends on the capabilities of the admin. 4 disks outperform 2 in all occasions, even using 4x 1 TB instead of 2x 2 TB + OS-SSDs will be faster and most probably cheaper.
 
  • Like
Reactions: Neobin and pos-sudo

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!