Improve VM guest disk performance (Ceph, 10 GBE, Qemu, Virtio)

Discussion in 'Proxmox VE: Installation and configuration' started by Mario Minati, Jun 23, 2019.

  1. Mario Minati

    Mario Minati New Member

    Joined:
    Jun 11, 2018
    Messages:
    8
    Likes Received:
    0
    Hello @all,
    we are running a Proxmox cluster with five nodes. Three of them are used for ceph, providing 2 pools, one with hdd, the other one with ssd. The two other nodes are used for virtualization with qemu.
    We have redundant 10 GBE storage networks and we have redundant 10 GBE ceph networks.
    The nodes are equipped with dual cpus and between 96 and 128 MB RAM. The three ceph nodes are completely identical.

    We read a lot of proxmox docs, this forum, did hours of googling, but we didn't find a solution for our performance troubles, yet.

    We are using the latest Proxmox:
    Code:
    # pveversion -v
    proxmox-ve: 5.4-1 (running kernel: 4.15.18-14-pve)
    pve-manager: 5.4-6 (running version: 5.4-6/aa7856c5)
    pve-kernel-4.15: 5.4-2
    pve-kernel-4.15.18-14-pve: 4.15.18-39
    pve-kernel-4.15.18-11-pve: 4.15.18-34
    pve-kernel-4.15.17-1-pve: 4.15.17-9
    ceph: 12.2.12-pve1
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-10
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-52
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-13
    libpve-storage-perl: 5.0-43
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-3
    proxmox-widget-toolkit: 1.0-28
    pve-cluster: 5.0-37
    pve-container: 2.0-39
    pve-docs: 5.4-2
    pve-edk2-firmware: 1.20190312-1
    pve-firewall: 3.0-21
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-9
    pve-i18n: 1.1-4
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 3.0.1-2
    pve-xtermjs: 3.12.0-1
    qemu-server: 5.0-51
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.13-pve1~bpo2
    We made rados benchmarks from our virtualization host into our ceph hdd pool and got the following results.
    Write:
    Code:
    # rados -p pub.hdd.bench bench -b 4M 60 write -t 16 --no-cleanup
    [...]
    Total time run: 60.571563
    Total writes made: 1715
    Write size: 4194304
    Object size: 4194304
    Bandwidth (MB/sec): 113.254
    Stddev Bandwidth: 40.2683
    Max bandwidth (MB/sec): 176
    Min bandwidth (MB/sec): 0
    Average IOPS: 28
    Stddev IOPS: 10
    Max IOPS: 44
    Min IOPS: 0
    Average Latency(s): 0.564394
    Stddev Latency(s): 0.343622
    Max latency(s): 2.84305
    Min latency(s): 0.0969665
    Read:
    Code:
    # rados -p pub.hdd.bench bench 60 seq -t 16 --no-cleanup
    [...]
    Total time run: 17.727840
    Total reads made: 1715
    Read size: 4194304
    Object size: 4194304
    Bandwidth (MB/sec): 386.962
    Average IOPS: 96
    Stddev IOPS: 21
    Max IOPS: 135
    Min IOPS: 48
    Average Latency(s): 0.163484
    Max latency(s): 1.54406
    Min latency(s): 0.0274371
    The maximum latency is a little high but shall not be the focus of this conversation.

    The OSD tree is synchon on all nodes:
    Code:
    # ceph osd tree
    ID CLASS WEIGHT   TYPE NAME                 STATUS REWEIGHT PRI-AFF
    -1       35.36691 root default                                   
    -3       11.78897     host pub-ceph-node-01                       
     0   hdd  5.45789         osd.0                 up  1.00000 1.00000
     1   hdd  5.45789         osd.1                 up  1.00000 1.00000
     8   ssd  0.87320         osd.8                 up  1.00000 1.00000
    -5       11.78897     host pub-ceph-node-02                       
     2   hdd  5.45789         osd.2                 up  1.00000 1.00000
     3   hdd  5.45789         osd.3                 up  1.00000 1.00000
     7   ssd  0.87320         osd.7                 up  1.00000 1.00000
    -7       11.78897     host pub-ceph-node-03                       
     4   hdd  5.45789         osd.4                 up  1.00000 1.00000
     5   hdd  5.45789         osd.5                 up  1.00000 1.00000
     6   ssd  0.87320         osd.6                 up  1.00000 1.00000 
    On our first virtualization server we have eight linux guests and two windows guests. The qemu agent is activated on all guests. All guest disks are created as VirtIO drives and are stored on our hdd pool.

    A linux guest configuration looks like this:
    Code:
    # qm config 402
    agent: 1
    balloon: 0
    boot: cdn
    bootdisk: virtio0
    cores: 2
    ide2: none,media=cdrom
    memory: 16384
    name: hbm-srv-02
    net0: virtio=52:54:00:6a:24:0a,bridge=vmbr0
    net1: virtio=A2:64:0E:18:02:27,bridge=vmbr1
    numa: 0
    ostype: l26
    scsihw: virtio-scsi-pci
    smbios1: uuid=c1587fd0-0b8a-4a84-9d4a-b9b1b919d3c5
    sockets: 2
    virtio0: pub.hdd.vm:vm-402-disk-0,cache=writeback,iothread=1,size=30G
    virtio1: pub.hdd.vm:vm-402-disk-1,cache=writeback,iothread=1,size=500G
    vmgenid: bbdb6d92-959f-41fc-951e-442c4cdf3626
    Running a fio benchmark on the client with the configuration above, while there was almost no traffic on the other clients, gives the following results:
    Code:
    # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/var/fio.tmp --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=75
    test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
    fio-2.16
    Starting 1 process
    Jobs: 1 (f=1): [m(1)] [99.7% done] [56044KB/19020KB/0KB /s] [14.2K/4755/0 iops] [eta 00m:03s]
    test: (groupid=0, jobs=1): err= 0: pid=17921: Sun Jun 23 21:21:35 2019
      read : io=6142.3MB, bw=6373.6KB/s, iops=1593, runt=986843msec
      write: io=2049.8MB, bw=2126.1KB/s, iops=531, runt=986843msec
      cpu          : usr=1.45%, sys=4.28%, ctx=1218785, majf=0, minf=9
      IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
         issued    : total=r=1572409/w=524743/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=64
    
    Run status group 0 (all jobs):
       READ: io=6142.3MB, aggrb=6373KB/s, minb=6373KB/s, maxb=6373KB/s, mint=986843msec, maxt=986843msec
      WRITE: io=2049.8MB, aggrb=2126KB/s, minb=2126KB/s, maxb=2126KB/s, mint=986843msec, maxt=986843msec
    
    Disk stats (read/write):
      vdb: ios=1572293/525175, merge=0/16, ticks=62668876/392852, in_queue=65241904, util=100.00%
    This looks like we are loosing quite a bit of disk performance. But why?
    We tried to switch to SCSI disk access in guests, but that doesn't improve anything against VirtIO.
    We have actived the extra thread for each disk and set the caching strategy to Writeback for best performance.

    What else can we do to improve the disk performance?

    How much bandwidth from the host should one expect within a guest?

    Why is the %util value 100% while doing the fio test. Is this a hint of the source of the problem?


    Any help or ideas are welcome.

    Best greets,

    Mario Minati
     
    #1 Mario Minati, Jun 23, 2019
    Last edited: Jun 23, 2019
  2. sb-jw

    sb-jw Active Member

    Joined:
    Jan 23, 2018
    Messages:
    547
    Likes Received:
    47
    What about your Pools, PGs, Replication Rules etc.

    Normally i would recommend to use SSDs only instead of HDD. Your Results seems not too bad for me, more expected for the Hardware behind. A bigger Network doesn't help you, when the Disks are not able to deliver these Performance.
     
  3. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,592
    Likes Received:
    325
    afaics you are comparing oranges and apples

    the radosbench tests with 4M blocksize =>

    write:
    read:
    while your fio tests with 4k size:
    you get less bandwidth with smaller blocksize, but more iops
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,309
    Likes Received:
    206
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice