Random disk IO is 10 times slower in VM compared to host

Gareth Blades

New Member
Oct 23, 2024
4
0
1
We are currently evaluating Proxmox with the intention to switch from Vmware ESXI. We have been using sysbench within a VM to test performance. All is fine with the exception of randon disk reads and writes which are significantly slower.

Machine specs:-
Lenovo SR630
RAID bus controller: Broadcom / LSI MegaRAID Tri-Mode SAS3508
4* Lenovo 960GB SSDs configured as raid-5. These are read intensive drives rated at 5 full drive writes per day for 5 years.

Testing commands :-
sysbench fileio --file-total-size=4G --file-test-mode=rndrw prepare
sysbench fileio --file-total-size=4G --file-test-mode=rndrw run

Vmware ESXI 8.0 :-
File operations:
reads/s: 8109.14
writes/s: 5406.03
fsyncs/s: 17308.89
Throughput:
read, MiB/s: 126.71
written, MiB/s: 84.47

Proxmox 8.2.2 host :-
File operations:
reads/s: 7155.38
writes/s: 4770.25
fsyncs/s: 15269.00
Throughput:
read, MiB/s: 111.80
written, MiB/s: 74.54

Ubuntu 24.04.1 LTS VM :-
File operations:
reads/s: 420.16
writes/s: 280.04
fsyncs/s: 903.12
Throughput:
read, MiB/s: 6.56
written, MiB/s: 4.38

The VM is configured to use Virtio SCSI Single as the SCSI controller.
I have tried different VM disk cache and Async IO options with little difference. Best the read MiB/s goes up to about 8.

Does anyone know why the random disk IO would be so much worse within the VM compared to the host itself.?

For most things it won't be a problem but for our bigger and busier databases it certainly will be so I would like to get this improved if possible.

Thanks
 
Last edited:
Welcome to the Proxmox forum, Gareth Blades!

The sudden drop in performance in the Ubuntu VM seems indeed quite odd. Could you post the full Ubuntu VM config (with qm config <vmid>) and information about how the storage of the VM is setup (e.g. output of cat /etc/pve/storage.cfg)? I'm assuming that the SSDs are setup with the hardware RAID controller, is the SAS3508 setup with any cache?
 
qm config
Code:
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;ide2;net0
cores: 2
cpu: x86-64-v2-AES
ide2: none,media=cdrom
memory: 2048
meta: creation-qemu=8.1.5,ctime=1729519937
name: offpmoxtest
net0: virtio=BC:24:11:38:74:58,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-0,aio=threads,format=raw,iothread=1,size=50G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=83a0cc73-e6ff-468f-878a-ad77b5122218
sockets: 1
vmgenid: cc15543d-058f-46b3-a16a-727add82b068

/etc/pve/storage.cfg on the proxmox server. I assume that is what you are after. The file doesn't exist on the ubuntu vm
Code:
dir: local
        path /var/lib/vz
        content iso,vztmpl,backup

lvmthin: local-lvm
        thinpool data
        vgname pve
        content rootdir,images

Yes the disks are setup with hardware raid. There is a built in cache on the raid controller together with battery backup. This is the raid configuration
Screenshot 2024-10-23 143644.jpg

Thanks
 
I've tried with the default io_uring async io and all the 'Cache' settings and also the default 'No Cache' and the different Async io options.
I have SSD Emulation enabled on this host but on the other proxmox install running an older proxmox 8.1 this wasn't emabled as I have the same issue on that system aswell.
 
Any more thoughts on this?
I've tried it on another server which is the same make and model with the same raid controller. It is running an older proxmox 8.1.4 installation.

Sysbench in host os (proxmox 8.1.4)
Throughput:
read, MiB/s: 111.16
written, MiB/s: 74.11

Sysbench in unbuntu 20.04 vm
Throughput:
read, MiB/s: 23.02
written, MiB/s: 15.35

It's not quite as bad as the other machine but still over 4* slower.
 
Interested as well ;) I don't have any raid, just "default" setup with ZFS and the performance is differing as well. It's not a big deal currently because customers are fine, but I am just wondering why the difference is so high..



Proxmox Host:
Code:
# sysbench fileio --file-total-size=4G --file-test-mode=rndrw run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 32MiB each
4GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      12259.64
    writes/s:                     8173.09
    fsyncs/s:                     26154.20

Throughput:
    read, MiB/s:                  191.56
    written, MiB/s:               127.70

General statistics:
    total time:                          10.0027s
    total number of events:              465907

Latency (ms):
         min:                                    0.00
         avg:                                    0.02
         max:                                   24.06
         95th percentile:                        0.08
         sum:                                 9876.88

Threads fairness:
    events (avg/stddev):           465907.0000/0.00
    execution time (avg/stddev):   9.8769/0.00

Virtual VM:
Code:
# sysbench fileio --file-total-size=4G --file-test-mode=rndrw run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 32MiB each
4GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!   


File operations:
    reads/s:                      5054.37
    writes/s:                     3369.58
    fsyncs/s:                     10791.45

Throughput:
    read, MiB/s:                  78.97
    written, MiB/s:               52.65

General statistics:
    total time:                          10.0064s
    total number of events:              192164

Latency (ms):
         min:                                    0.00
         avg:                                    0.05
         max:                                   22.16
         95th percentile:                        0.23
         sum:                                 9939.47

Threads fairness:
    events (avg/stddev):           192164.0000/0.00
    execution time (avg/stddev):   9.9395/0.00

My Storage Config on PVE:
Code:
root@virtual:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,vztmpl
        shared 0

zfspool: local-zfs
        pool rpool/data
        content rootdir,images
        sparse 1

cifs: storagebox-fsn1
        disable
        path /mnt/pve/storagebox-fsn1
        server u123456.your-storagebox.de
        share backup
        content backup
        prune-backups keep-all=1
        username u123456

pbs: PBS-Backup
        datastore backup
        server pbs.example.com
        content backup
        prune-backups keep-all=1
        username root@pam!pve-server

The virtual disk has default settings with discard/thin provisioning:

1748415121505.png
 
As said on many other threads before:

If you compare 4K read/write on the host with a zvol based VM and 4 KB read/write, you will have read and write amplification of factor 4 (16K volblocksize in ZFS compared to 4K on your host). So every number you have in your VM, must be multiplied with 4 in order to reflect the actual background IO transfers to the storage. The same is true for blocksizes much greater than 4K, e.g. 4M. On your host and if you test the filesystem, you will have a recordsize of 128K as maximum blocksize (so to speak), than can be compressed greatly and be written in one go. On the zvol, it is chunked into 16K blocks.