VMs very slow speed between disks

RobBM

New Member
Jan 26, 2024
3
0
1
Hello,

I've recently bought cheapo dedicated server in OVH to host couple of VMs

Server is: Adv-STOR-2:
AMD Ryzen 7 Pro 3700 - 8c/16t - 3.6GHz/4.4GH
2x SSD SATA 480GB Enterprise Class Soft RAID
64GB DDR4 ECC 2933MHz
4× 14TB HDD SAS Soft RAID (WUH721414AL5201)
Promox is installed on the SSDs and I created ZFS ZRAID.

I created couple of Ubuntu VMs (max 10) all of them are 2vCPU and 4 GB RAM, they mostly sit idle (testing), however I do have one other VM that has allocated 12 GB ram, 4vCPU and attached two ~5 TB disks which have some data on them. I wanted to move data from disk A to disk B (which both reside on the ZFS Pool) using mv (initially) and then rsync to see actual transfer speeds, which start around 100 MB/s but quickly go down to around 30 MB/s, sometimes even as low as 5 MB/s and they jump between these all the time. I know this is a cheap server but it seems a bit slow nontheless

Previosly I was also seeing some messages "task xxx blocked for more than 120 seconds" but these no longer show (for now) after following advices in this thread: https://forum.proxmox.com/threads/virtio-task-xxx-blocked-for-more-than-120-seconds.25835/

Is this a normal behaviour/speed?

VM settings:
SCSI: VirIO SCSI single
2x 5 TB hard disk with io thread enabled and async io set to threads, no cache, qcow2

Please let me know what else I can provide except arc_summary - I'm new to proxmox and not all that familiar with linux.

pveperf /LOCAL
CPU BOGOMIPS: 115202.56
REGEX/SECOND: 4078602
HD SIZE: 25629.98 GB (LOCAL)
FSYNCS/SECOND: 91.71
DNS EXT: 26.43 ms
DNS INT: 0.35 ms (local)


dd if=/dev/zero of=/LOCAL/bigfile bs=1M count=8192 conv=fdatasync (Promxox)
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.97819 s, 2.9 GB/s

VM:
dd if=/dev/zero of=/hdd5tb/bigfile bs=1M count=8192 conv=fdatasync
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 63.36 s, 136 MB/s

Thank you
 

Attachments

Last edited:
Please post the output of zpool status and qm config <vmid>. Please put it in [CODE]...[/CODE]-Tags.

Pure suspicion: you are copying (= reading and writing data at the same time) from rotating rust with the performance of a single drive (because of RaidZ). 30 MByte/s seems low but it is not really bad...

dd if=/dev/zero of=/LOCAL/bigfile bs=1M count=8192 conv=fdatasync (Promxox)
Using zeros as data for benchmarking gives zero useful results. Please search for "fio" in this forum.
 
Hi, thank you for taking the time to answer. I really appreciate it. While I know these are spinning disk, 30 MB/s feels really bad to me - I assume the fluctuating speed (30-250 MB/s) is caused by the cache/RAM filling up?

Would you recommend switching to different RAID for better performance? I can live with 30 TB of space if it means better performance.

qm config:
Code:
agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0;net0
cores: 4
cpu: host
memory: 12000
meta: creation-qemu=8.1.2,ctime=1706049442
name: vm1
net0: virtio=BC:24:11:58:D8:2C,bridge=vmbr1,firewall=1,tag=104
numa: 0
onboot: 1
ostype: l26
scsi0: local:211/vm-211-disk-0.qcow2,iothread=1,size=30G
scsi1: LOCAL:vm-211-disk-0,aio=threads,backup=0,iothread=1,size=6000G
scsi2: LOCAL:vm-211-disk-1,aio=threads,iothread=1,size=5000G
scsihw: virtio-scsi-single
smbios1: uuid=fbff64dd-02b1-46b6-9b1b-1b977e95aea4
sockets: 1
startup: up=240
vcpus: 4
vmgenid: 41837197-c3f1-472f-9024-557a3876e46a
Code:
pool: LOCAL
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        LOCAL                       ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            scsi-35000cca28f61435c  ONLINE       0     0     0
            scsi-35000cca28f6169a0  ONLINE       0     0     0
            scsi-35000cca28f6199d0  ONLINE       0     0     0
            scsi-35000cca28f60467c  ONLINE       0     0     0

errors: No known data errors
fio - I'm not familiar with this tool so I took some sample commands I found on the forum, hope it's OK:
Code:
fio --name PROXMOX --rw=randwrite --filename=/LOCAL/test1 --size=4g --blocksize=4k --iodepth=1 --numjobs=1 --ioengine=posixaio
PROXMOX: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
PROXMOX: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][98.7%][w=57.5MiB/s][w=14.7k IOPS][eta 00m:02s]
PROXMOX: (groupid=0, jobs=1): err= 0: pid=114884: Sat Jan 27 09:50:47 2024
  write: IOPS=6921, BW=27.0MiB/s (28.3MB/s)(4096MiB/151501msec); 0 zone resets
    slat (nsec): min=400, max=942794, avg=1000.33, stdev=1964.10
    clat (nsec): min=410, max=34470k, avg=143022.01, stdev=128374.04
     lat (usec): min=13, max=34471, avg=144.02, stdev=128.43
    clat percentiles (usec):
     |  1.00th=[   14],  5.00th=[   15], 10.00th=[   23], 20.00th=[   81],
     | 30.00th=[  133], 40.00th=[  141], 50.00th=[  153], 60.00th=[  169],
     | 70.00th=[  182], 80.00th=[  192], 90.00th=[  210], 95.00th=[  229],
     | 99.00th=[  265], 99.50th=[  289], 99.90th=[  465], 99.95th=[  553],
     | 99.99th=[ 1549]
   bw (  KiB/s): min=14984, max=158752, per=98.82%, avg=27358.75, stdev=16289.59, samples=302
   iops        : min= 3746, max=39688, avg=6839.69, stdev=4072.40, samples=302
  lat (nsec)   : 500=0.01%
  lat (usec)   : 10=0.01%, 20=9.76%, 50=9.17%, 100=1.51%, 250=77.24%
  lat (usec)   : 500=2.24%, 750=0.05%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=1.30%, sys=1.92%, ctx=1056478, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=27.0MiB/s (28.3MB/s), 27.0MiB/s-27.0MiB/s (28.3MB/s-28.3MB/s), io=4096MiB (4295MB), run=151501-151501msec
VM:
Code:
VM: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process

VM: (groupid=0, jobs=1): err= 0: pid=73405: Sat Jan 27 09:55:33 2024
  write: IOPS=37.2k, BW=145MiB/s (152MB/s)(4096MiB/28204msec); 0 zone resets
    slat (nsec): min=350, max=1166.4k, avg=2747.72, stdev=5552.23
    clat (nsec): min=180, max=1978.4k, avg=23590.66, stdev=22058.62
     lat (usec): min=5, max=2147, avg=26.34, stdev=23.50
    clat percentiles (usec):
     |  1.00th=[   16],  5.00th=[   17], 10.00th=[   18], 20.00th=[   18],
     | 30.00th=[   19], 40.00th=[   19], 50.00th=[   20], 60.00th=[   22],
     | 70.00th=[   23], 80.00th=[   24], 90.00th=[   29], 95.00th=[   40],
     | 99.00th=[   97], 99.50th=[  137], 99.90th=[  285], 99.95th=[  383],
     | 99.99th=[  816]
   bw (  KiB/s): min=69216, max=196040, per=99.89%, avg=148543.00, stdev=31985.30, samples=56
   iops        : min=17304, max=49010, avg=37135.75, stdev=7996.32, samples=56
  lat (nsec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 10=0.02%, 20=55.47%, 50=40.92%, 100=2.63%
  lat (usec)   : 250=0.80%, 500=0.11%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=9.58%, sys=20.97%, ctx=1049686, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=145MiB/s (152MB/s), 145MiB/s-145MiB/s (152MB/s-152MB/s), io=4096MiB (4295MB), run=28204-28204msec

Disk stats (read/write):
  sdb: ios=50/429009, merge=0/394211, ticks=4/116029, in_queue=116032, util=22.54%
 
Last edited:
Would you recommend switching to different RAID for better performance?
Well.., yes.

You would have to destroy(!) LOCAL as there is no way to live-convert the type of the vdev. Probably you also need to <node> --> Disks --> Wipe Disk to make them officially re-usable as creating new ZFS Pools will only accept actually empty drives.

Then re-create it as "striped mirrors" aka "Raid10". The capacity would drop to below two drives. (And keep in mind that it should not get filled above 80% to 90%; if you store VMs there this becomes 50% to 70%.) But the IOPS would double from same-as-one-drive to same-as-TWO-drives.

If that's worth it you need to decide. Copying large amounts of data on the same physical disks is not a usual task for me. (In my use cases I would move or link them.)

Good luck :-)
 
  • Like
Reactions: RobBM
Thank you for the advice. I'll decide whether to move with different RAID or not. Striped mirrors sounds good but having only only 50% of two drives is pretty steap price to pay for perfomance. I tried finding some info about the need to not fill up the ZFS pool above certain % but came up empty. Can you point me to a right direction where I can find some info about this?

Just to clarify, I initially transferred some data to a VM but had to change the underlying OS so I attached the disk with data to a 2nd VM and wanted to consolidate the data from both of them on a singular disk - which is why I wanted to move it locally from disk A to disk B. I assumed it'd be quick process (1 TB) since it's all on the same ZFS and was surprised with speeds avereging 30-40 MB/s
 
Last edited:
I tried finding some info about the need to not fill up the ZFS pool above certain % but came up empty. Can you point me to a right direction where I can find some info about this?

Well... a quick search gave me no definitive links for my "50%" statement, sorry. (Also we can not search for "50%" in this forum...)

"For good IOPS/block storage performance, keeping below 50% full is a recommendation." -- www.truenas.com/community/threads/benchmarking-zfs-performance-on-truenas.91684/post-635063

"As a general rule of thumb, at about 50% capacity your pool will be noticeably slower than it was when it was 10% capacity. At about 80%-96% capacity, your pool starts to become very slow,..." -- https://www.servethehome.com/an-introduction-to-zfs-a-place-to-start/

"Keep pool capacity below 80% for best performance" -- https://docs.oracle.com/cd/E23823_01/html/819-5461/zfspools-4.html

Of course a ZFS Pool will work with a usage usage above 90%. But the speed will drop exponentially, starting somewhere far below that...