Unstable disk read performance on Guest VM (NVME + Ceph)

Talion

Member
Jun 19, 2018
15
0
21
47
Hello,

Have a 4 x Server Proxmox 5.2 Cluster and having problem my Linux guest VM. I have got unstable test results for my disk read performance. Couldnt comment it really. Whats is the best way to test it properly? and where shall i check and fix it ?

Your help will be appreciated, thanks.

Guest VM (Ubuntu 14.04 LTS)

Code:
agent: 1
balloon: 49152
boot: c
bootdisk: scsi0
cores: 16
cpu: host,flags=+pcid
ide1: none,media=cdrom
memory: 81920
name: xxxxx
net0: virtio=xxxxxx,bridge=vmbr0,queues=8
net1: virtio=xxxxxx,bridge=vmbr1,queues=8
numa: 1
onboot: 1
ostype: l26
scsi0: vmstorages_vm:vm-101-disk-1,size=300G
scsihw: virtio-scsi-pci
smbios1: uuid=a6c0b706-00f0-4696-9446-c5d6b769aac5
sockets: 2

Code:
root@vm:~# dd if=/dev/zero of=/root/testfile bs=10G count=1 oflag=dsync
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 31.378 s, 68.4 MB/s

Code:
root@vm:~# hdparm -Tt /dev/sda1

/dev/sda1:
 Timing cached reads:   14932 MB in  1.99 seconds = 7492.03 MB/sec
 Timing buffered disk reads: 242 MB in  2.65 seconds =  91.33 MB/sec


Code:
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.1.3
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 3072MB)
Jobs: 1 (f=1): [m] [100.0% done] [24123KB/6162KB/0KB /s] [5706/1426/0 iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=1512: Sat Jun 30 01:10:51 2018
  Description  : [Emulation of Intel IOmeter File Server Access Pattern]
  read : io=2454.1MB, bw=82163KB/s, iops=13449, runt= 30596msec
    slat (usec): min=4, max=8443, avg=11.71, stdev=19.82
    clat (usec): min=158, max=220366, avg=2292.13, stdev=5849.43
     lat (usec): min=281, max=220398, avg=2304.09, stdev=5849.52
    clat percentiles (usec):
     |  1.00th=[  458],  5.00th=[  564], 10.00th=[  652], 20.00th=[  788],
     | 30.00th=[  924], 40.00th=[ 1064], 50.00th=[ 1224], 60.00th=[ 1384],
     | 70.00th=[ 1624], 80.00th=[ 2192], 90.00th=[ 4384], 95.00th=[ 7264],
     | 99.00th=[16192], 99.50th=[23424], 99.90th=[95744], 99.95th=[134144],
     | 99.99th=[201728]
    bw (KB  /s): min=22112, max=147151, per=100.00%, avg=83203.52, stdev=25891.59
  write: io=631875KB, bw=20652KB/s, iops=3370, runt= 30596msec
    slat (usec): min=4, max=7188, avg=14.05, stdev=24.65
    clat (msec): min=1, max=219, avg= 9.77, stdev=11.59
     lat (msec): min=1, max=219, avg= 9.78, stdev=11.59
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    4], 10.00th=[    5], 20.00th=[    5],
     | 30.00th=[    6], 40.00th=[    6], 50.00th=[    7], 60.00th=[    9],
     | 70.00th=[   12], 80.00th=[   14], 90.00th=[   17], 95.00th=[   20],
     | 99.00th=[   36], 99.50th=[  102], 99.90th=[  165], 99.95th=[  186],
     | 99.99th=[  210]
    bw (KB  /s): min= 6249, max=36060, per=100.00%, avg=20910.00, stdev=6495.43
    lat (usec) : 250=0.01%, 500=1.72%, 750=11.88%, 1000=14.71%
    lat (msec) : 2=33.83%, 4=10.45%, 10=18.15%, 20=7.87%, 50=1.09%
    lat (msec) : 100=0.12%, 250=0.18%
  cpu          : usr=6.84%, sys=26.98%, ctx=187467, majf=0, minf=6437
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=411489/w=103121/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2454.1MB, aggrb=82162KB/s, minb=82162KB/s, maxb=82162KB/s, mint=30596msec, maxt=30596msec
  WRITE: io=631874KB, aggrb=20652KB/s, minb=20652KB/s, maxb=20652KB/s, mint=30596msec, maxt=30596msec

Disk stats (read/write):
    dm-0: ios=410902/102987, merge=0/0, ticks=916296/1002312, in_queue=1920508, util=99.82%, aggrios=411490/103141, aggrmerge=0/6, aggrticks=926312/1005664, aggrin_queue=1932100, aggrutil=99.71%
  sda: ios=411490/103141, merge=0/6, ticks=926312/1005664, in_queue=1932100, util=99.71%



Proxmox Server 5.2;
Code:
root@pmxn4:~# hdparm -Tt /dev/nvme0n1                                  

/dev/nvme0n1:
 Timing cached reads:   15638 MB in  1.99 seconds = 7845.94 MB/sec
 Timing buffered disk reads: 5372 MB in  3.00 seconds = 1790.02 MB/sec

Code:
root@pmxn4:~# dd if=/dev/zero of=/root/testfile bs=10G count=1 oflag=dsync
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 2.07058 s, 1.0 GB/s

Code:
root@pmxn4:~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 111.8G  0 disk
├─sda1        8:1    0  1007K  0 part
├─sda2        8:2    0 111.8G  0 part
└─sda9        8:9    0     8M  0 part
sdb           8:16   0 111.8G  0 disk
├─sdb1        8:17   0  1007K  0 part
├─sdb2        8:18   0 111.8G  0 part
└─sdb9        8:25   0     8M  0 part
zd0         230:0    0     8G  0 disk [SWAP]
nvme0n1     259:0    0 232.9G  0 disk
├─nvme0n1p1 259:4    0   100M  0 part /var/lib/ceph/osd/ceph-14
└─nvme0n1p2 259:5    0 232.8G  0 part
nvme2n1     259:1    0 232.9G  0 disk
├─nvme2n1p1 259:6    0   100M  0 part /var/lib/ceph/osd/ceph-12
└─nvme2n1p2 259:7    0 232.8G  0 part
nvme1n1     259:2    0 232.9G  0 disk
├─nvme1n1p1 259:8    0   100M  0 part /var/lib/ceph/osd/ceph-13
└─nvme1n1p2 259:9    0 232.8G  0 part
nvme3n1     259:3    0 232.9G  0 disk
├─nvme3n1p1 259:10   0   100M  0 part /var/lib/ceph/osd/ceph-15
└─nvme3n1p2 259:11   0 232.8G  0 part

Code:
root@pmxn4:~# pveperf /var/lib/ceph/osd/ceph-14
CPU BOGOMIPS:      134140.48
REGEX/SECOND:      1902499
HD SIZE:           0.09 GB (/dev/nvme0n1p1)
BUFFERED READS:    33.33 MB/sec
AVERAGE SEEK TIME: 0.00 ms
FSYNCS/SECOND:     524.12
DNS EXT:           16.94 ms
DNS INT:           14.91 ms (localx.club)

Talion
 
Last edited:
Hi,

dd and hdparm are no meaningful benchmark tools.
Use fio instead.
But as the pveperf indicates you are using consumer NVMe.
Consumer NVMe has the same problems as consumer SSD the sync capability is bad.
Use enterprise Disk only when you use Ceph.
 
Hello Wolfgang,

Thank you very much for the reply. I made fio tests both on host and VM. Here are the results. And even with consumer nvme why there is a big difference happening between host and VM with dd and hdparm tests ? I am curious if i miss a settings on host or vm side.

fio settings;

Code:
[iometer]
bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10
rw=randrw
rwmixread=80
direct=1
size=4g
ioengine=libaio
# IOMeter defines the server loads as the following:
# iodepth=1     Linear
# iodepth=4     Very Light
# iodepth=8     Light
# iodepth=64    Moderate
# iodepth=256   Heavy
iodepth=64

Proxmox Host;

Code:
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [43591KB/11111KB/0KB /s] [10.3K/2588/0 iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=1311178: Mon Jul  2 12:21:20 2018
  Description  : [Emulation of Intel IOmeter File Server Access Pattern]
  read : io=3279.8MB, bw=123468KB/s, iops=20035, runt= 27201msec
    slat (usec): min=2, max=4468, avg=26.53, stdev=11.28
    clat (usec): min=43, max=62935, avg=1319.17, stdev=2279.77
     lat (usec): min=121, max=62958, avg=1345.89, stdev=2279.77
    clat percentiles (usec):
     |  1.00th=[  169],  5.00th=[  247], 10.00th=[  318], 20.00th=[  442],
     | 30.00th=[  572], 40.00th=[  692], 50.00th=[  812], 60.00th=[  932],
     | 70.00th=[ 1096], 80.00th=[ 1336], 90.00th=[ 2512], 95.00th=[ 4192],
     | 99.00th=[10688], 99.50th=[14528], 99.90th=[30080], 99.95th=[37632],
     | 99.99th=[54528]
  write: io=835856KB, bw=30729KB/s, iops=5031, runt= 27201msec
    slat (usec): min=3, max=5915, avg=36.43, stdev=25.87
    clat (msec): min=1, max=63, avg= 7.31, stdev= 4.43
     lat (msec): min=1, max=63, avg= 7.35, stdev= 4.43
    clat percentiles (usec):
     |  1.00th=[ 2864],  5.00th=[ 3536], 10.00th=[ 3856], 20.00th=[ 4320],
     | 30.00th=[ 4640], 40.00th=[ 4960], 50.00th=[ 5344], 60.00th=[ 5920],
     | 70.00th=[ 7904], 80.00th=[10816], 90.00th=[14016], 95.00th=[16064],
     | 99.00th=[20608], 99.50th=[23168], 99.90th=[37632], 99.95th=[43776],
     | 99.99th=[56576]
    lat (usec) : 50=0.01%, 100=0.01%, 250=4.16%, 500=15.47%, 750=16.27%
    lat (usec) : 1000=15.75%
    lat (msec) : 2=18.84%, 4=7.59%, 10=16.41%, 20=5.09%, 50=0.41%
    lat (msec) : 100=0.02%
  cpu          : usr=7.40%, sys=22.52%, ctx=711257, majf=0, minf=1357
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=544974/w=136862/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=3279.8MB, aggrb=123467KB/s, minb=123467KB/s, maxb=123467KB/s, mint=27201msec, maxt=27201msec
  WRITE: io=835855KB, aggrb=30728KB/s, minb=30728KB/s, maxb=30728KB/s, mint=27201msec, maxt=27201msec

Disk stats (read/write):
  rbd0: ios=544845/136835, merge=0/5, ticks=593576/968272, in_queue=1564812, util=99.69%


Ubuntu 14.04 LTS VM;

Code:
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.1.3
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 3072MB)
Jobs: 1 (f=1): [m] [100.0% done] [63344KB/15779KB/0KB /s] [14.8K/3738/0 iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=14067: Mon Jul  2 12:21:56 2018
  Description  : [Emulation of Intel IOmeter File Server Access Pattern]
  read : io=2454.1MB, bw=128724KB/s, iops=21070, runt= 19529msec
    slat (usec): min=4, max=2022, avg=11.69, stdev= 9.35
    clat (usec): min=212, max=55130, avg=1488.41, stdev=2153.69
     lat (usec): min=243, max=55137, avg=1500.35, stdev=2153.77
    clat percentiles (usec):
     |  1.00th=[  378],  5.00th=[  466], 10.00th=[  532], 20.00th=[  636],
     | 30.00th=[  748], 40.00th=[  860], 50.00th=[  980], 60.00th=[ 1128],
     | 70.00th=[ 1272], 80.00th=[ 1528], 90.00th=[ 2672], 95.00th=[ 4320],
     | 99.00th=[10560], 99.50th=[14144], 99.90th=[28032], 99.95th=[35072],
     | 99.99th=[48384]
    bw (KB  /s): min=23299, max=196107, per=100.00%, avg=128846.90, stdev=36063.68
  write: io=631875KB, bw=32356KB/s, iops=5280, runt= 19529msec
    slat (usec): min=4, max=8325, avg=14.09, stdev=32.34
    clat (msec): min=1, max=54, avg= 6.11, stdev= 4.12
     lat (msec): min=1, max=54, avg= 6.12, stdev= 4.13
    clat percentiles (usec):
     |  1.00th=[ 2064],  5.00th=[ 2512], 10.00th=[ 2800], 20.00th=[ 3184],
     | 30.00th=[ 3472], 40.00th=[ 3792], 50.00th=[ 4320], 60.00th=[ 5152],
     | 70.00th=[ 6624], 80.00th=[ 9536], 90.00th=[12352], 95.00th=[14144],
     | 99.00th=[18560], 99.50th=[21120], 99.90th=[32384], 99.95th=[40192],
     | 99.99th=[50432]
    bw (KB  /s): min= 6349, max=49097, per=100.00%, avg=32384.67, stdev=9077.99
    lat (usec) : 250=0.01%, 500=6.05%, 750=18.29%, 1000=16.69%
    lat (msec) : 2=28.63%, 4=14.41%, 10=11.39%, 20=4.24%, 50=0.29%
    lat (msec) : 100=0.01%
  cpu          : usr=10.34%, sys=43.77%, ctx=148873, majf=0, minf=4997
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=411489/w=103121/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=2454.1MB, aggrb=128724KB/s, minb=128724KB/s, maxb=128724KB/s, mint=19529msec, maxt=19529msec
  WRITE: io=631874KB, aggrb=32355KB/s, minb=32355KB/s, maxb=32355KB/s, mint=19529msec, maxt=19529msec

Disk stats (read/write):
    dm-0: ios=411778/103288, merge=0/0, ticks=596288/627976, in_queue=1224876, util=99.59%, aggrios=411778/103258, aggrmerge=0/38, aggrticks=596600/627964, aggrin_queue=1225464, aggrutil=99.29%
  sda: ios=411778/103258, merge=0/38, ticks=596600/627964, in_queue=1225464, util=99.29%

Hi,

dd and hdparm are no meaningful benchmark tools.
Use fio instead.
But as the pveperf indicates you are using consumer NVMe.
Consumer NVMe has the same problems as consumer SSD the sync capability is bad.
Use enterprise Disk only when you use Ceph.
 
Because dd and hdparm mostly test the cache and not the storage itself.
Also dd whit zero writes only zeros, many modern storages can compress this and has nothing to write.
 
  • Like
Reactions: Talion
In fio tests there is also difference between host and vm, how shall i elaborate ? I tried all cache modes on VM and it makes no difference and i change my ceph.conf i also dont see any difference. Or i should accept this %30 loss on performance as normal ?

Thanks!

ceph.conf;
Code:
[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 172.29.244.0/24
         fsid = 0775e8a1-a920-47b3-aa55-1ec0c9a8948c
         keyring = /etc/pve/priv/$cluster.$name.keyring
         mon allow pool delete = true
         osd journal size = 5120
         osd pool default min size = 2
         osd pool default size = 3
         public network = 172.29.244.0/24

         debug asok = 0/0
         debug auth = 0/0
         debug buffer = 0/0
         debug client = 0/0
         debug context = 0/0
         debug crush = 0/0
         debug filer = 0/0
         debug filestore = 0/0
         debug finisher = 0/0
         debug heartbeatmap = 0/0
         debug journal = 0/0
         debug journaler = 0/0
         debug lockdep = 0/0
         debug mds = 0/0
         debug mds balancer = 0/0
         debug mds locker = 0/0
         debug mds log = 0/0
         debug mds log expire = 0/0
         debug mds migrator = 0/0
         debug mon = 0/0
         debug monc = 0/0
         debug ms = 0/0
         debug objclass = 0/0
         debug objectcacher = 0/0
         debug objectcatcher = 0/0
         debug objecter = 0/0
         debug optracker = 0/0
         debug osd = 0/0
         debug paxos = 0/0
         debug perfcounter = 0/0
         debug rados = 0/0
         debug rbd = 0/0
         debug rgw = 0/0
         debug throttle = 0/0
         debug timer = 0/0
         debug tp = 0/0

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring
         osd mkfs type = xfs
         osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
         osd mkfs options xfs = "-f -i size=2048"
         osd scrub load threshold = 2.5
         osd max backfills = 1
         osd recovery max active = 1
         osd op threads = 4
         osd disk threads = 1
         osd enable op tracker = false
         osd_op_num_threads_per_shard = 1 # default 2
         osd_op_num_shards = 10 # default 5
         osd_disk_thread_ioprio_class  = idle
         osd_disk_thread_ioprio_priority = 7

[mon.pmxn3]
         host = pmxn3
         mon addr = 172.29.244.13:6789

[mon.pmxn1]
         host = pmxn1
         mon addr = 172.29.244.11:6789

[mon.pmxn2]
         host = pmxn2
         mon addr = 172.29.244.12:6789

[mon.pmxn4]
         host = pmxn4
         mon addr = 172.29.244.14:6789


storage.cfg;

Code:
dir: local
        path /var/lib/vz
        content iso,backup,vztmpl

zfspool: local-zfs
        pool rpool/data
        content rootdir,images
        sparse 1

rbd: vmstorages_vm
        content images
        krbd 0
        pool vmstorages

rbd: vmstorages_ct
        content rootdir
        krbd 1
        pool vmstorages

rbd: lxcstorages_vm
        content images
        krbd 0
        pool lxcstorages

rbd: lxcstorages_ct
        content rootdir
        krbd 1
        pool lxcstorages

rbd: rbd_vm
        content images
        krbd 0
        pool rbd

rbd: rbd_ct
        content rootdir
        krbd 1
        pool rbd


Because dd and hdparm mostly test the cache and not the storage itself.
Also dd whit zero writes only zeros, many modern storages can compress this and has nothing to write.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!