I/O performance for heavy workload

karypid

Member
Mar 7, 2021
30
8
13
47
Hello,

I use Proxmox to host a VM that I use for general purpose desktop (with GPU passthrough). I've noticed that my system becomes unresponsive (I would describe it as high latency in reacting to my input, similar to memory thrashing due to swap usage) during periods of high disk I/O. For example, when downloading huge files (e.g. steam game downloads, or downloading the monero blockchain).

Below are the settings I use for my VM. Can anyone spot if there's some way to improve I/O performance?

1718739583397.png
1718739601971.png

The controller is a "VirtIO SCSI single".




The host is a 5950X (16c/32t) but unfortunately uses SATA disks. When watching usage in mission centre (below screenshot), it does not seem like there is constant high throughput (bottom graph) but the driver seems to be "100% active" (not sure what that means) as per the top graph for "Active time".

Have I misconfigured something that might cause this low speed, or are my consumer SSDs simply that slow (especially for write)?


1718740028842.png
 
> are my consumer SSDs simply that slow (especially for write)?

Well, you don't mention hardware details, so it's kinda hard to advise. But yah, if you've got something like a WD Blue or a Crucial BX or (God forbid) QLC for backing storage, your life as a sysadmin is not going to be easy.

There's a reason for the serious recommendation being: Use Enterprise-class SSD with PLP for Promox.
 
  • Like
Reactions: justinclift
IMHO you have 2 simple options :
- invest money and time in hardware + reconfig/reinstall, be it enterprise SSD, SAS HBA, (maybe high-end) consumer NVMe, ...
- invest time to re-install PVE with different partition format and see if you feel better (if not you can always change hardware)

Based on my humble experience with PVE _only_ on consumer SSD, back and forth between EXT4 and ZFS, my advice is :
- take the best I/O device for (local) PVE storage, be it a NVMe consumer SSD or SATA SSD
- VM's OS disk on EXT4 "dir" in qcow2 format (allow for VM snapshots)
- LXC's OS disk on LVM-thin (allow for LXC snapshots)

My feeling is better with EXT4 "dir" in qcow2 format than with ZFS and has been confirmed by some simple tests : secure-erased SSD, cloned VM's on fresh PVE install, CrystalDiskMark. For the different write benchmarks, depending on the SSD, the number improved from 25% to 200% (yes, up to 3x in write). For reads the improvement is more 10-50% (depending on benchmark and SSD model).

Another possible solution that can combine with the above is to have the "data" of the VM on different physical storage. For example an iSCSI LUN from your (virtual) NAS given to the VM, or a SMB share. If you have fast networking (if physical NAS) or virtio NIC's inside you node between VM's (NAS and desktop) this can be quite fast and prevent I/O bottleneck of a single drive for OS + data.

Have a ncie day,
 
  • Like
Reactions: Kingneutron
> are my consumer SSDs simply that slow (especially for write)?

Well, you don't mention hardware details, so it's kinda hard to advise. But yah, if you've got something like a WD Blue or a Crucial BX or (God forbid) QLC for backing storage, your life as a sysadmin is not going to be easy.

There's a reason for the serious recommendation being: Use Enterprise-class SSD with PLP for Promox.

So, I checked and I have two identical 1TB Crucial MX500 disks which seem to have "Micron 256Gb 64-layer 3D TLC". I am not sure how bad these are compared to other SATA models of their time, but clearly are not up-to-par currently...

I think I have the following two problems:
  • I have set up a mirrored ZFS pool which I guess is responsible for the abysmal performance. I guess every bit of data written has to be sent to both disks and this delays the I/O

  • I seem to have both disks on the same SATA controller. Perhaps I could put them on separate PCI buses. Here is the lshw relevant output:

Code:
              *-pci:4
                   description: PCI bridge
                   product: Matisse PCIe GPP Bridge
                   vendor: Advanced Micro Devices, Inc. [AMD]
                   physical id: a
                   bus info: pci@0000:02:0a.0
                   version: 00
                   width: 32 bits
                   clock: 33MHz
                   capabilities: pci pm pciexpress msi ht normal_decode bus_master cap_list
                   configuration: driver=pcieport
                   resources: irq:40 memory:fc300000-fc3fffff
                 *-sata
                      description: SATA controller
                      product: FCH SATA Controller [AHCI mode]
                      vendor: Advanced Micro Devices, Inc. [AMD]
                      physical id: 0
                      bus info: pci@0000:07:00.0
                      logical name: scsi1
                      logical name: scsi2
                      version: 51
                      width: 32 bits
                      clock: 33MHz
                      capabilities: sata pm pciexpress msi ahci_1.0 bus_master cap_list emulated
                      configuration: driver=ahci latency=0
                      resources: irq:46 memory:fc300000-fc3007ff
                    *-disk:0
                         description: ATA Disk
                         product: CT1000MX500SSD1
                         physical id: 0
                         bus info: scsi@1:0.0.0
                         logical name: /dev/sda
                         version: 033
                         serial: 2109E4FEB629
                         size: 931GiB (1TB)
                         capabilities: gpt-1.00 partitioned partitioned:gpt
                         configuration: ansiversion=5 guid=3d06ad44-d1c0-43c1-a475-03e363db6cc0 logicalsectorsize=512 sectorsize=4096
[... (volumes)]
                    *-disk:1
                         description: ATA Disk
                         product: CT1000MX500SSD1
                         physical id: 1
                         bus info: scsi@2:0.0.0
                         logical name: /dev/sdb
                         version: 032
                         serial: 2016E29C4808
                         size: 931GiB (1TB)
                         capabilities: gpt-1.00 partitioned partitioned:gpt
                         configuration: ansiversion=5 guid=1d681fb2-1fd2-47aa-8825-771f581a298f logicalsectorsize=512 sectorsize=4096
[... (volumes)]

Is it worth opening up my PC and moving one of the disks to this SATA controller:

Code:
              *-pci:3
                   description: PCI bridge
                   product: Matisse PCIe GPP Bridge
                   vendor: Advanced Micro Devices, Inc. [AMD]
                   physical id: 9
                   bus info: pci@0000:02:09.0
                   version: 00
                   width: 32 bits
                   clock: 33MHz
                   capabilities: pci pm pciexpress msi ht normal_decode bus_master cap_list
                   configuration: driver=pcieport
                   resources: irq:38 memory:fc400000-fc4fffff
                 *-sata
                      description: SATA controller
                      product: FCH SATA Controller [AHCI mode]
                      vendor: Advanced Micro Devices, Inc. [AMD]
                      physical id: 0
                      bus info: pci@0000:06:00.0
                      version: 51
                      width: 32 bits
                      clock: 33MHz
                      capabilities: sata pm pciexpress msi ahci_1.0 bus_master cap_list
                      configuration: driver=ahci latency=0
                      resources: irq:45 memory:fc400000-fc4007ff

Would this make a difference or is my best option to buy newer NVMe devices?
 
Last edited:
The symptoms you're describing:
I've noticed that my system becomes unresponsive during periods of high disk I/O ...
...is commonly what happens while using (consumer) SSDs when they run out of their fast cache. The system then goes slow as a dog (and kind of stutters) while the disks then try to complete their outstanding IO tasks using the much lower native speed of the storage.

Would this make a difference or is my best option to buy newer NVMe devices?
"Newer" is the wrong concept. What you're probably needing is storage devices that can keep operating all day at their given speed, rather than having stuff with only a limited capacity to absorb large quantities of IO.

I went with a SAS card and Ebay SAS SSDs (because they're super cheap), but you could do just as well by picking up some brand new Samsung PM893's if that's more your kind of thing.
 
Last edited:
The symptoms you're describing:

...is commonly what happens while using (consumer) SSDs when they run out of their fast cache. The system then goes slow as a dog (and kind of stutters) while the disks then try to complete their outstanding IO tasks using the much lower native speed of the storage.


"Newer" is the wrong concept. What you're probably needing is storage devices that can keep operating all day at their given speed, rather than having stuff with only a limited capacity to absorb large quantities of IO.

I went with a SAS card and Ebay SAS SSDs (because they're super cheap), but you could do just as well by picking up some brand new Samsung PM893's if that's more your kind of thing.

I am thinking of using an external SAS enclosure, something like this: https://www.highpoint-tech.com/product-page/rocketstor-6434ts

Since the controller is a PCI card, does the connection to the box reduce the performance or is it equally fast? My plan is to add the PCI card, then purchase some cheap SSDs from ebay (as suggested above) and put them in the enclosure as JBODs. I would hope that I can run ZFS with stripe+mirror across 4 drives to get redunduncy and speed? Would this connector limit the speed?

I must admit, I am quite unfamiliar with SAS so am wondering if this setup would decent...
 
I must admit, I am quite unfamiliar with SAS so am wondering if this setup would decent...
The concept of what you're looking at seems fine to me. I've not used RocketStor stuff before, so don't have any good/bad experiences with their stuff to draw upon.

Looking through the data sheet for that specific bundle, it lists the PCIe interface it uses as PCIe 3.0 x8.

PCIe 3.0 x8 has a maximum throughput of over 7.5GB/s, which should be plenty fast enough for what you're doing.

If there's a speed limit in the system from something, it's unlikely to be from the PCIe interface.



Looking around the Proxmox Forum for "RocketRaid" and doing some online searching, it seems like RocketRaid make their own proprietary drivers for Linux (like Nvidia does).

That's generally not a fantastic sign, as then you're reliant on them making a version of their driver that's compatible with the Linux kernel you're running. If/when they drop support for your card, or if they take 6+ months to update their driver, you could be left with perfectly capable hardware but no good way of running recent software.

That being said, if your only other options are far more expensive then the RocketRaid bundle might be the best choice. :)
 
Looking around the Proxmox Forum for "RocketRaid" and doing some online searching, it seems like RocketRaid make their own proprietary drivers for Linux (like Nvidia does).

I know exactly what you mean. The pain my laptop still gives me to this day is terrible (I have the Nvidia disabled right now due to instability in 6.9.x kernels).

I will look into this more. If the driver is not in the kernel OSS codebase I could get some other controller that has the same interface and only get an enclosure from them. I will need to do some proper research before I invest in this. Thank you for pointing out the driver issue.
 
  • Like
Reactions: justinclift
forget ZFS for gaming !
So far this has not been an issue at all. The problem only exhibits during prolonged writing to disk. I mentioned Steam downloading as an example, but if I am not actually downloading while gaming (and you can disable updates while gaming) then there is no issue.
 
ZFS writes many TBW than ext4/Lvmthin, there are many topic with ssd burnt.
I don't see the point for a gaming VM.

Here often use ZFS for its mirror.
But with consumer disks, only PVE OS partition is on ZFS partition mirrored to have Boot redundancy.
Other partitions are Lvmthin.
 
Any filesystem is going to have problems when you use consumer SSDs and exceed their internal pseudo-SLC cache with heaps of prolonged writing. That's not a ZFS exclusive thing.

As a data point, I'm using ZFS on my personal desktop (that I also game on occasionally after hours), and it has no issues as all. I'm using Ebay SAS SSDs though, not consumer drives.
 
Last edited:
I will need to do some proper research before I invest in this.
This is probably what you need as it explains (and shows) the different types of SAS connectors/cables, both internal and external ones: https://www.youtube.com/watch?v=OW419HwU7sg

I remember watching that several months ago, and it filled in some gaps in my knowledge. That was prior to my buying my SAS card and drives from Ebay, which have been rock solid. :)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!