Incredibly slow I/O on installation drive of Proxmox across multiple servers

Max41

New Member
Dec 5, 2018
4
0
1
25
Hey there,

I'm not sure if this is expected across Proxmox or if I am just the only one who has noticed it (I have googled and can't find any particular threads regarding this) - but for some reason on any Proxmox installation drive, I get MASSIVE drops in I/O but VM's on the exact same drives get full speed.

An example is my SSD which generally got 500 MB/s I/O with a normal Linux OS verse Proxmox:

(Command in use: dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync)
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.4488 s, 93.8 MB/s

I then test on my NVMe server in RAID 1:
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.1319 s, 343 MB/s

The NVMe server gets results of around 1.5 GB/s inside of a virtual machine.

So basically just wondering if this is expected or if there's some sort of misconfiguration I have done?

Thanks!
 
as you test with of=..file , what is the filesystem used ?

and what is the ssd drive model ?

I'm not entirely sure of the flags in the command entirely, it's just a command I've found online and seems to always be used when testing I/O generally. The SSD is Integral P 120 GB, capable of 500+ MB/s read and write but this isn't a model issue as it's happened on two different models completely across different hardware. It's being ran on the main Proxmox installation however, inside of the VM is where I see full speeds.
 
I'm not entirely sure of the flags in the command entirely, it's just a command I've found online and seems to always be used when testing I/O generally. The SSD is Integral P 120 GB, capable of 500+ MB/s read and write but this isn't a model issue as it's happened on two different models completely across different hardware. It's being ran on the main Proxmox installation however, inside of the VM is where I see full speeds.

dd is not a good benchmark, because you don't do any parallelism (only 1 sequential stream), you need to do a real bench with somethlng like "fio". (look in the forum).

Also, consumer ssd drives are pretty bad in write, with fsync. ( I have seen some consumer ssd slower than HDD). only enterprise ssd can be good for fsync.

And what is your filesystem ? (zfs or classic lvm+xfs/ext4) ?
because if this is zfs, it'll write data twice, and do fsync each write in in journal.
(you should bench directly a block device instead a file)
 
dd is not a good benchmark, because you don't do any parallelism (only 1 sequential stream), you need to do a real bench with somethlng like "fio". (look in the forum).

Also, consumer ssd drives are pretty bad in write, with fsync. ( I have seen some consumer ssd slower than HDD). only enterprise ssd can be good for fsync.

And what is your filesystem ? (zfs or classic lvm+xfs/ext4) ?
because if this is zfs, it'll write data twice, and do fsync each write in in journal.
(you should bench directly a block device instead a file)

My second server has enterprise NVMe drives, two 480GB one for that matter and the file system on both is ext4. I understand where you are coming from but the exact same test works fine inside a VM - why wouldn't it work on the dedicated server as well as me having the problem across two servers?

I am running fio now and will post update in a few minutes.

EDIT: The results aren't good.

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75


test: (groupid=0, jobs=1): err= 0: pid=1522: Wed Dec 5 06:12:27 2018
read : io=3067.9MB, bw=11115KB/s, iops=2778 , runt=282627msec
write: io=1028.2MB, bw=3725.3KB/s, iops=931 , runt=282627msec
cpu : usr=1.48%, sys=7.28%, ctx=212966, majf=0, minf=499
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=785361/w=263215/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
READ: io=3067.9MB, aggrb=11115KB/s, minb=11115KB/s, maxb=11115KB/s, mint=282627msec, maxt=282627msec
WRITE: io=1028.2MB, aggrb=3725KB/s, minb=3725KB/s, maxb=3725KB/s, mint=282627msec, maxt=282627msec

Disk stats (read/write):
dm-5: ios=785159/263752, merge=0/0, ticks=15989172/2003008, in_queue=18052008, util=100.00%, aggrios=786438/263514, aggrmerge=26/311, aggrticks=16034160/1997108, aggrin_queue=18031808, aggrutil=100.00%
sda: ios=786438/263514, merge=26/311, ticks=16034160/1997108, in_queue=18031808, util=100.00%
 
Last edited:
do you have same bench from the vm to compare ?
is the vm drive on lvm ?

I'm seeing dm-5 in your logs, do you use some kind of software raid on you host ?

if you have an empty drive, it could be great to have bench on /dev/sdx directly on your host, to see the difference vs file. (and avoid filesystem overhead)
 
My second server has enterprise NVMe drives, two 480GB one for that matter and the file system on both is ext4. I understand where you are coming from but the exact same test works fine inside a VM - why wouldn't it work on the dedicated server as well as me having the problem across two servers?

I am running fio now and will post update in a few minutes.

EDIT: The results aren't good.

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75


test: (groupid=0, jobs=1): err= 0: pid=1522: Wed Dec 5 06:12:27 2018
read : io=3067.9MB, bw=11115KB/s, iops=2778 , runt=282627msec
write: io=1028.2MB, bw=3725.3KB/s, iops=931 , runt=282627msec
cpu : usr=1.48%, sys=7.28%, ctx=212966, majf=0, minf=499
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=785361/w=263215/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
READ: io=3067.9MB, aggrb=11115KB/s, minb=11115KB/s, maxb=11115KB/s, mint=282627msec, maxt=282627msec
WRITE: io=1028.2MB, aggrb=3725KB/s, minb=3725KB/s, maxb=3725KB/s, mint=282627msec, maxt=282627msec

Disk stats (read/write):
dm-5: ios=785159/263752, merge=0/0, ticks=15989172/2003008, in_queue=18052008, util=100.00%, aggrios=786438/263514, aggrmerge=26/311, aggrticks=16034160/1997108, aggrin_queue=18031808, aggrutil=100.00%
sda: ios=786438/263514, merge=26/311, ticks=16034160/1997108, in_queue=18031808, util=100.00%

From a single SSD:
Code:
Jobs: 1 (f=1): [m(1)] [97.8% done] [78518KB/26377KB/0KB /s] [19.7K/6594/0 iops] [eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=30053: Wed Dec  5 07:43:33 2018
  read : io=3070.4MB, bw=68841KB/s, iops=17210, runt= 45670msec
  write: io=1025.8MB, bw=22998KB/s, iops=5749, runt= 45670msec
  cpu          : usr=8.00%, sys=25.00%, ctx=208202, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=785996/w=262580/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64
So if you are using NVMe drives then this drives must be deadly slow.
 
From a single SSD:
Code:
Jobs: 1 (f=1): [m(1)] [97.8% done] [78518KB/26377KB/0KB /s] [19.7K/6594/0 iops] [eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=30053: Wed Dec  5 07:43:33 2018
  read : io=3070.4MB, bw=68841KB/s, iops=17210, runt= 45670msec
  write: io=1025.8MB, bw=22998KB/s, iops=5749, runt= 45670msec
  cpu          : usr=8.00%, sys=25.00%, ctx=208202, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=785996/w=262580/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64
So if you are using NVMe drives then this drives must be deadly slow.

This is from my standard SSD, my other server has NVMe as of which I've shown the result. There's a clear issue in one way or another here, as I have also provided the NVMe results as to where when on an actual VPS I got 1.5GB/s I/O - I only get 300 MB/s on the Proxmox OS itself.

It's not a particular issue I care about as I've ran it like this for over a year but always thought it was just my server, with moving to a new one with enterprise NVMe drives - I thought I may as well ask to see if this is common.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!