PVE 6.0 slow SSD RAID1 performance in Windows VM

  • Like
Reactions: zeref
thank you for your response. I am talking about the low IO performance under ZFS RAID1 or RAID10
That is not the problem here.
Here we are talking about a disk pass through.
 
  • Like
Reactions: zeref
Correct me if I'm wrong, but these benchmarks are done on the host(PVE) not in the VM.
 
  • Like
Reactions: zeref
Correct me if I'm wrong, but these benchmarks are done on the host(PVE) not in the VM.

Exactly, but just as the thread mentions, the issue persists inside the vm too.

After trying 5 trillion things, I assume there is some issue in the IO path like you mentioned but I just cant figure it out.

8x Samsung PM883 in DELL PERC H730p mini HBA mode under RAID10 ZFS in proxmox are choking in IO performance, while
RAID10 on Hardware Controller without proxmox (WServer2016) delivers max performance :(
 
1 thing to test, is "physical_block_size=4096,logical_block_size=512" on disk. not available in proxmox code currently

it seem to help with windows + cephrbd, maybe it can help too on local ssd
https://review.opendev.org/#/c/658283/

"Ceph performs much better when I/O is 4k-aligned otherwise it has to read the 4k from disk, modify and write it again. Linux guests generally submit 4k-aligned I/O, however Windows guests generally submit 512b-aligned I/O. When hinted with physical_block_size=4096 Windows guests will switch to submitting most I/O as 4k-aligned based on both testing and information from Microsoft KB 2510009."

you can edit /usr/share/perl5/PVE/QemuServer.pm,
and after
Code:
        if ($drive->{ssd} && ($devicetype eq 'block' || $devicetype eq 'hd')) {
            $device .= ",rotation_rate=1";
        }
add
Code:
        $device .= ",physical_block_size=4096,logical_block_size=512";

then restart pvedaemon service and restart the vm.

you can also try with 4096,4096 but it need reformat on the drive in the vm.
 
  • Like
Reactions: zeref and guletz
I would say you test things that are not comparable.
1. A PM883 can max write/read theoretically 600MB more is SATA3 not capable.
minus the overhead, you got 550 per disk.
Now you tell us in Windows2016 you get in a raid10 about 5 GB read.
But if the Raid would read from all Disk simultaneously you get max 4,4GB bandwidth.
This is about 15% over theoretical speed, so there must be a cache involved.

2.) You compare dd with fio
The problem is you do not use the same setting.
With fio you disable the cache and make sync io and at dd not.

3.) I don't know if the benchmarks of /dev/sda are the PM883.
if do this are good values for sync direct 4k writes of a single disk.

So read this thread from the beginning and make comparable benchmarks and do not test anything.
 
I would say you test things that are not comparable.
1. A PM883 can max write/read theoretically 600MB more is SATA3 not capable.
minus the overhead, you got 550 per disk.
Now you tell us in Windows2016 you get in a raid10 about 5 GB read.
But if the Raid would read from all Disk simultaneously you get max 4,4GB bandwidth.
This is about 15% over theoretical speed, so there must be a cache involved.

2.) You compare dd with fio
The problem is you do not use the same setting.
With fio you disable the cache and make sync io and at dd not.

3.) I don't know if the benchmarks of /dev/sda are the PM883.
if do this are good values for sync direct 4k writes of a single disk.

So read this thread from the beginning and make comparable benchmarks and do not test anything.

Hi wolfgang,

thank you for your response.

I agree with your points and I am sorry if I posted something too quickly and hastily. The bottom line is, that the 4K IO performance is very low e.g. (30mb/s instead of 90mb/s). Well I am still trying to test 5 trillion other things including what spirit posted. Seems like the degraded IO performance is only under Windows 10 or Windows Server 2016.
 
Hi,
I have done some test on local lvm, on 1 intel s3610 sata drive
4k randwrite with fio on host && guest windows2019 with virtio-scsi and default options
(I have also tested with crystalmark, I have almost same result than fio)


host result
---------------


fio --time_based --name=benchmark --size=5G --runtime=30 --filename=/dev/pve/vm-1705-disk-0 --direct=1 --ioengine=libaio --numjobs=X --iodepth=X --rw=randwrite --blocksize=4k --group_reporting


iodepth=8,numjob=8
bw=208477KB/s, iops=52119,

iodepth=32,numjob=1
bw=237537KB/s, iops=59384


iodepth=1,numjob=1
bw=69831KB/s, iops=17457


windows 2019 result
------------------------------
fio.exe --time_based --name=benchmark --size=5G --runtime=30 --filename=\\.\PhysicalDrive2 --ioengine=windowsaio --numjobs=X --iodepth=X --rw=randwrite --blocksize=4k --group_reporting


thread8,iodepth=8
write: IOPS=59.9k, BW=234MiB/s

thread1, iodepth=32
IOPS=59.5k, BW=232MiB/s

thread1,iodepth=1
write : IOPS=9224, BW=36.0MiB/s


So, it's almost the same for with big iodepth, and twice slower with iodepth=1 (expected with virtulization overhead)
I have attached crystalmark results too.

(I'm using 3,2ghz intel cpu, I think it can also help for low iodepth)
 

Attachments

  • localssd.png
    localssd.png
    28.7 KB · Views: 8
Hi,
I have done some test on local lvm, on 1 intel s3610 sata drive
4k randwrite with fio on host && guest windows2019 with virtio-scsi and default options
(I have also tested with crystalmark, I have almost same result than fio)


host result
---------------


fio --time_based --name=benchmark --size=5G --runtime=30 --filename=/dev/pve/vm-1705-disk-0 --direct=1 --ioengine=libaio --numjobs=X --iodepth=X --rw=randwrite --blocksize=4k --group_reporting


iodepth=8,numjob=8
bw=208477KB/s, iops=52119,

iodepth=32,numjob=1
bw=237537KB/s, iops=59384


iodepth=1,numjob=1
bw=69831KB/s, iops=17457


windows 2019 result
------------------------------
fio.exe --time_based --name=benchmark --size=5G --runtime=30 --filename=\\.\PhysicalDrive2 --ioengine=windowsaio --numjobs=X --iodepth=X --rw=randwrite --blocksize=4k --group_reporting


thread8,iodepth=8
write: IOPS=59.9k, BW=234MiB/s

thread1, iodepth=32
IOPS=59.5k, BW=232MiB/s

thread1,iodepth=1
write : IOPS=9224, BW=36.0MiB/s


So, it's almost the same for with big iodepth, and twice slower with iodepth=1 (expected with virtulization overhead)
I have attached crystalmark results too.

(I'm using 3,2ghz intel cpu, I think it can also help for low iodepth)

windows 2019 result
ur fio settings

thread8,iodepth=8
write: IOPS=44.6k, BW=174MiB/s

thread1, iodepth=32
IOPS=9,8k, bw=38.4MiB/s <= ?????

thread1,iodepth=1
write : IOPS=IOPS=7249, bw=28.3MiB/s
 
Hi, any update on this? Did you find anything to improve the performance here?

For your case,

Host:

Code:
iodepth=1,numjob=1
bw=69831KB/s, iops=17457

VM:
Code:
thread1,iodepth=1
write : IOPS=9224, BW=36.0MiB/s

In my case,
Host:
Code:
iodepth=1,numjob=1
write: IOPS=20.4k, BW=79.6MiB/s (83.4MB/s)

VM (Windows 10 Pro):
Code:
thread1,iodepth=1
write: IOPS=4917, BW=19.2MiB/s (20.1MB/s)

VM (Ubuntu Server 1604 LTS):
Code:
thread1,iodepth=1
write :bw=17579KB/s, iops=4394
 
Hi,

for iodepth=1, the latency is important. (network latency, but also cpu time on client/server).
The latency occur for each io.

The only way to improve it, is to enable writeback, as it's grouping small ios in bigger one.

The only problem with writeback, is that is slowing doing read.

But since ceph-octopus, it's no more a problem, reads are not slowing down with writeback anymore.


So, try to update to ceph-otopus and enable writeback.

Repo is available here: http://download.proxmox.com/debian/ceph-octopus/dists/buster/test/

(I'm already using it in production)


with a simple

fio --time_based --name=benchmark --size=5G --runtime=30 --filename=/dev/sdd --direct=1 --ioengine=libaio --numjobs=1 --iodepth=1 --rw=randwrite --blocksize=4k

I'm jumping from 2000 iops to 12000 iops.
 
  • Like
Reactions: rushandrush
Here some iops result with 1vm - 1disk - 4k block iodepth=64, librbd, no iothread.

Code:
                        nautilus-cache=none     nautilus-cache=writeback          octopus-cache=none     octopus-cache=writeback
          
randread 4k                  62.1k                     25.2k                            61.1k                     60.8k
randwrite 4k                 27.7k                     19.5k                            34.5k                     53.0k
seqwrite 4k                  7850                      37.5k                            24.9k                     82.6k

windows octopus benchmark:
 

Attachments

  • ceph-cache-none.JPG
    ceph-cache-none.JPG
    65.6 KB · Views: 10
  • ceph-cache-writeback.JPG
    ceph-cache-writeback.JPG
    62.9 KB · Views: 11
  • Like
Reactions: rushandrush

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!