Over 30% worse disk performance from within VM

paszczakojad

New Member
Mar 18, 2023
12
0
1
Hi,

I wonder if is that normal that the disk performance inside VM is so much worse than from the host system. I have two disks and two VMs, cache=writeback, iothread=1, driver VirtIO SCSI single. One disk is a 3 Gbps SATA SSD connected to P410i RAID controller (RAID0 with only one disk). Performance from host (command "hdparm -tT /dev/sda"):

/dev/sda:
Timing cached reads: 19184 MB in 1.99 seconds = 9621.66 MB/sec
Timing buffered disk reads: 722 MB in 3.01 seconds = 240.13 MB/sec

That's expected from 3Gbps disk. Now a test from VM:

/dev/sda:
Timing cached reads: 14854 MB in 1.99 seconds = 7459.21 MB/sec
Timing buffered disk reads: 724 MB in 3.01 seconds = 241.20 MB/sec

Controller performance is worse (9621->7459), but that doesn't affect disk yet. Disk performance is fine.

On the second VM I have two SATA SSDs in RAID0, so the baremetal performance is two times higher (as expected):

/dev/sdb:
Timing cached reads: 17972 MB in 1.99 seconds = 9011.92 MB/sec
Timing buffered disk reads: 1698 MB in 3.00 seconds = 565.98 MB/sec

And now the same disk from inside the VM:

/dev/sda:
Timing cached reads: 15140 MB in 1.99 seconds = 7575.27 MB/sec
Timing buffered disk reads: 1122 MB in 3.00 seconds = 373.70 MB/sec

373 MB/s instead of 565 MB/s - that's 34% worse. Why?

Best regards,

Piotr
 
For starters... better to use fio to benchmark instead of a VERY short hdparm test. Never used hdparm for benchmarking before, but good chance you're reading from cache :)
 
For starters... better to use fio to benchmark instead of a VERY short hdparm test. Never used hdparm for benchmarking before, but good chance you're reading from cache :)
The fio gave similar result:

command line: fio --filename=/dev/sda --direct=1 --rw=randread --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 --readonly

First disk from host: READ: bw=245MiB/s (257MB/s)
First disk from VM: bw=249MiB/s (261MB/s)
Second disk from host: READ: bw=493MiB/s (517MB/s)
Second disk from VM: READ: bw=384MiB/s (402MB/s)

That shows 23% degradation. But I don't trust fio - when I run it again, in the begining I get results like 2250MiB/s, then 384 MiB/s, apparently from cache. In case of hdparm first line is from cache, second from disk and results are the same during subsequent runs.

Best regards,

Piotr
 
Yeah, sounds like cache. For fio you can use rampup parameter to circumvent cache. For the 30% performance degredation.. don’t know if that’s normal, sounds like a lot to me.
 
I changed bs in fio to 128k and got 440 MiB/s. With bs=1024k there was 511 MiB/s. Maybe this has something to do with the number of I/O requests? With 64k there are 6400 IOPS, with 128k - 3500 IOPS, with 1024k - 500 IOPS. But when fio reads from cache it gets like 15k IOPS and manages that... With bs=32k there is 300 MiB/s and 9500 IOPS.

In case of the first VM the improvement with bs=128k wasn't big - 255 MiB/s instead of 249 MiB/s.

I though maybe that has something to do with RAID0 - it has default strip of 64k, so if fio reads randomly blocks of 64k, then there is 50% chance that the next block would be on the same physical disc. But fio test on the bare host gave correct results, so that's not the reason...

P.
 
I just noticed, that changing fio option --numjobs on the VM from 4 to 1 improved reading speeds to 470 MiB/s (493 MB/s). Why more jobs lower that performance?

The latencies have been smaller in case of 1 job. On the host avg lat is 32 ms with 4 jobs, on the VM - 46 ms. With one job - 8 ms on the host, 8 ms on the VM. Apparently that latency affects the performance - but why it grows faster on the VM? It should grow proportionally to the number of jobs. When on the host there is 4 more threads, then the latency is 4 times higher - that's expected. On the VM it is 5.5 times higher.

On the host machine there's is no difference in read performance (or even it's opposite - 1 job gives 489 MiB/s, 4 jobs give 494 MiB/s)

I also tried to increase the number of CPU cores available to the VM (from 4 to 8) - no difference in performance.

P.
 
I'll keep answering to myself, perhaps my findings would be useful to someone :)

I attached the disk as passthrough:

qm set 101 -scsi1 /dev/disk/by-id/scsi-3600508b1001c9b79db4d138b7d8ecdc3

And now I got 400 MiB/s / 419 MB/s (fio, 4 jobs), latency 40ms. Not much, but there's a slight improvement (5%).
 
Last edited:
Have you tested without the raid controller?

Raid controllers are a plague that have no place in a world of SSDs and filesystems like ZFS.
 
It's an old ProLiant DL360 Gen6 with P410i controller, I cannot bypass it (there's SATA connector for CDROM, but just one, while I need to connect up to 8 disks)
 
Autch, P410i can't be full on HBAd or flashed to anything else as far as I know either.

Best bet, get a LSI HBA card (flashable and rebranded HP/Dell/... HBA or OG LSI) off of eBay and use that instead.
 
Do you mean that P410i can't support 8 disks in HBA mode? Anyway - would it be more efficient to build software RAID on top of HBA disks than running hardware RAID on the P410i?
 
Do you mean that P410i can't support 8 disks in HBA mode? Anyway - would it be more efficient to build software RAID on top of HBA disks than running hardware RAID on the P410i?
No no, that controller just can't be configured as HBA or crossflashed to something like a base LSI HBA (that's what most people do, flash HP/Dell/etc cards to the underlying plain LSI9211 flavored cards).

Only thing you can do with this specific controller is setup the disks as a bunch of single disk HP flavored Raid0s (you NEVER have raw disk access, it always uses HP logic to present the space on the disks, not the disks themselves). There's no way to get any sort of raw access to the disks on those controllers.


Hence why it's easier and cheaper to just get a plain LSI9211-8i based card of ebay. They go for like 30-60€ and are always in ample supply.
 
Good to hear some things changed in that regard. Hadn't looked into it for a few years (since I moved on to NVMe and PCIe based storage).

HP especially have always been really hard on keeping their chains proprietary (comes from them actually being Compaq, who were even worse in that regard when I worked for them in the 90's) and especially with the advent of flash based storage, raid controllers have become all but useless. G6/G7 are the two generations in between when SSD was coming up and became the standard choice, so they still held on to that philosophy (added, G6/G7 are the first generations after the Intel Xeon 5xxx series, which used 500W on idle)

Add file systems like ZFS to this all and there's no reason to have any hardware raid anymore, on the contrary, any hardware raid is detrimental to performance and the lifespan of SSDs.

Unless HP did some fuckery of pretending it's a HBA mode in that firmware, where it all looks like a straight line, but really isn't, you'll probably see near bare metal performance across the board after you update.
 
Hrm, reading that article in full, it's talking about patching the kernel to get it to work.

I'd still just buy an actual HBA card if that's the case.

Unless these patches have become part of the mainline kernels, you're potentially going to get stuck in a situation where you have to redo this procedure almost every single time the kernel gets update.
 
I didn't use HBA yet, but I created separate logical drives for each disk (i.e. RAID0 array with just one disk). I tried to use ZFS, but got some problems with reliable testing, so I reverted to mdadm. And the results are good:

2xSATA SSD, hardware RAID0 - there are my previous results host 518 MB/s VM storage 403 MB/s VM passthrough 425 MB/s 2xSATA SSD, each on separate RAID0, md RAID0 made on host host 518 MB/s VM passthrough 518 MB/s 2xSATA SSD, each on separate RAID0, passthrough to VM, md RAID0 made on VM VM passthrough 518 MB/s

That seems fine :)
 
FWIW, I had a RAID5 running from a VM on three disks passed through to the VM and performance was poor. A few days ago, I removed the passthrough and ran the RAID directly in Proxmox and mounted the drive by NFS. It is way faster like this. As I wanted to use the RAID5 between 2 VM's I couldn't safely use iSCSI. Here is my 2 post thread - https://forum.proxmox.com/threads/move-raid-array-from-vm-control-to-proxmox-control.124314/. For testing the speed I used dd - https://www.howtouselinux.com/post/how-to-test-nfs-performance-on-linux.

[edit]
This was done without any data loss on the RAID which had about 10TB of data on it
[/edit]
 
Last edited:
I'll share some test results, perhaps they will be useful for someone. I enabled HBA mode to have direct access to the disks, but I had an impression that in some cases it was slower. Anyway I needed that for example for 'secure erase' function (and I'm not sure if trimming would work with logical drives). That also required making a USB recovery stick with modified HBA module, so the recovery system will see HBA drives. I tested SATA and SAS SSDs and I also installed a NVMe disk on a PCIe adapter (only one PCIe slot was free, so no RAIDs there).

I noticed that I'm unable to boot from HBA drives, but I found a solution - I could connect a SSD drive to the internal SATA controller used for CDROM drive:

https://serverfault.com/questions/7...the-sata-cd-rom-to-sata-ssd-on-hp-dl360p-gen8

So now I have a total of 10 drives - 1 boot drive with proxmox, 1 NVMe with one system, two RAID1 systems on 2 drives each and one RAID10 system with 4 drives :)

Tests were done using fio. In some cases I tested with bs=1M to see maximum interface throughput. The rest is with bs=64k.

Code:
1xSATA SSD, hardware RAID0
            host              261 MB/s
            VM storage        261 MB/s
2xSATA SSD, hardware RAID0
    host                      518 MB/s
    VM storage                403 MB/s
    VM passthrough            425 MB/s
2xSATA SSD, each on separate RAID0, md RAID0 made on host
    host                      518 MB/s
    VM passthrough            518 MB/s          
2xSATA SSD, each on separate RAID0, passthrough to VM, md RAID0 made on VM
    VM passthrough            518 MB/s
    HBA mode, md na VM        389 MB/s
            bs=1M             507 MB/s
3xSATA SSD, md on host        773 MB/s
1xSAS SSD on host, HBA mode   520 MB/s
    passthrough to VM         410 MB/s
2xSAS SSD, HBA, mdadm on host:      
                              756 MB/s
2xSAS SSD, HBA, passthrough to VM, mdadm on VM:      
                              757 MB/s
3xSAS passthrough, mdadm on VM  
                              1131 MB/s
2xSAS passthrough, md RAID1 on host, test on VM  
                              765 MB/s
                bs=1M         1048 MB/s
4xSAS passthrough, RAID10 on VM  
                              1770 MB/s
                bs=1M         1834 MB/s
NVMe on host                  1811 MB/s
                bs=1M         1799 MB/s
NVMe passthrough to VM        1827 MB/s  
                bs=1M         1814 MB/s

P.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!