VMs very slow speed between disks

RobBM · Jan 26, 2024

Hello,

I've recently bought cheapo dedicated server in OVH to host couple of VMs

Server is: Adv-STOR-2:
AMD Ryzen 7 Pro 3700 - 8c/16t - 3.6GHz/4.4GH
2x SSD SATA 480GB Enterprise Class Soft RAID
64GB DDR4 ECC 2933MHz
4× 14TB HDD SAS Soft RAID (WUH721414AL5201)
Promox is installed on the SSDs and I created ZFS ZRAID.

I created couple of Ubuntu VMs (max 10) all of them are 2vCPU and 4 GB RAM, they mostly sit idle (testing), however I do have one other VM that has allocated 12 GB ram, 4vCPU and attached two ~5 TB disks which have some data on them. I wanted to move data from disk A to disk B (which both reside on the ZFS Pool) using mv (initially) and then rsync to see actual transfer speeds, which start around 100 MB/s but quickly go down to around 30 MB/s, sometimes even as low as 5 MB/s and they jump between these all the time. I know this is a cheap server but it seems a bit slow nontheless

Previosly I was also seeing some messages "task xxx blocked for more than 120 seconds" but these no longer show (for now) after following advices in this thread: https://forum.proxmox.com/threads/virtio-task-xxx-blocked-for-more-than-120-seconds.25835/

Is this a normal behaviour/speed?

VM settings:
SCSI: VirIO SCSI single
2x 5 TB hard disk with io thread enabled and async io set to threads, no cache, qcow2

Please let me know what else I can provide except arc_summary - I'm new to proxmox and not all that familiar with linux.

pveperf /LOCAL
CPU BOGOMIPS: 115202.56
REGEX/SECOND: 4078602
HD SIZE: 25629.98 GB (LOCAL)
FSYNCS/SECOND: 91.71
DNS EXT: 26.43 ms
DNS INT: 0.35 ms (local)

dd if=/dev/zero of=/LOCAL/bigfile bs=1M count=8192 conv=fdatasync (Promxox)
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.97819 s, 2.9 GB/s

VM:
dd if=/dev/zero of=/hdd5tb/bigfile bs=1M count=8192 conv=fdatasync
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 63.36 s, 136 MB/s

Thank you

UdoB · Jan 27, 2024

Please post the output of zpool status and qm config <vmid>. Please put it in [CODE]...[/CODE]-Tags.

Pure suspicion: you are copying (= reading and writing data at the same time) from rotating rust with the performance of a single drive (because of RaidZ). 30 MByte/s seems low but it is not really bad...

RobBM said:
dd if=/dev/zero of=/LOCAL/bigfile bs=1M count=8192 conv=fdatasync (Promxox)

Using zeros as data for benchmarking gives zero useful results. Please search for "fio" in this forum.

RobBM · Jan 27, 2024

Hi, thank you for taking the time to answer. I really appreciate it. While I know these are spinning disk, 30 MB/s feels really bad to me - I assume the fluctuating speed (30-250 MB/s) is caused by the cache/RAM filling up?

Would you recommend switching to different RAID for better performance? I can live with 30 TB of space if it means better performance.

qm config:

Code:

agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0;net0
cores: 4
cpu: host
memory: 12000
meta: creation-qemu=8.1.2,ctime=1706049442
name: vm1
net0: virtio=BC:24:11:58:D8:2C,bridge=vmbr1,firewall=1,tag=104
numa: 0
onboot: 1
ostype: l26
scsi0: local:211/vm-211-disk-0.qcow2,iothread=1,size=30G
scsi1: LOCAL:vm-211-disk-0,aio=threads,backup=0,iothread=1,size=6000G
scsi2: LOCAL:vm-211-disk-1,aio=threads,iothread=1,size=5000G
scsihw: virtio-scsi-single
smbios1: uuid=fbff64dd-02b1-46b6-9b1b-1b977e95aea4
sockets: 1
startup: up=240
vcpus: 4
vmgenid: 41837197-c3f1-472f-9024-557a3876e46a

Code:

pool: LOCAL
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        LOCAL                       ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            scsi-35000cca28f61435c  ONLINE       0     0     0
            scsi-35000cca28f6169a0  ONLINE       0     0     0
            scsi-35000cca28f6199d0  ONLINE       0     0     0
            scsi-35000cca28f60467c  ONLINE       0     0     0

errors: No known data errors

fio - I'm not familiar with this tool so I took some sample commands I found on the forum, hope it's OK:

Code:

fio --name PROXMOX --rw=randwrite --filename=/LOCAL/test1 --size=4g --blocksize=4k --iodepth=1 --numjobs=1 --ioengine=posixaio
PROXMOX: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
PROXMOX: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][98.7%][w=57.5MiB/s][w=14.7k IOPS][eta 00m:02s]
PROXMOX: (groupid=0, jobs=1): err= 0: pid=114884: Sat Jan 27 09:50:47 2024
  write: IOPS=6921, BW=27.0MiB/s (28.3MB/s)(4096MiB/151501msec); 0 zone resets
    slat (nsec): min=400, max=942794, avg=1000.33, stdev=1964.10
    clat (nsec): min=410, max=34470k, avg=143022.01, stdev=128374.04
     lat (usec): min=13, max=34471, avg=144.02, stdev=128.43
    clat percentiles (usec):
     |  1.00th=[   14],  5.00th=[   15], 10.00th=[   23], 20.00th=[   81],
     | 30.00th=[  133], 40.00th=[  141], 50.00th=[  153], 60.00th=[  169],
     | 70.00th=[  182], 80.00th=[  192], 90.00th=[  210], 95.00th=[  229],
     | 99.00th=[  265], 99.50th=[  289], 99.90th=[  465], 99.95th=[  553],
     | 99.99th=[ 1549]
   bw (  KiB/s): min=14984, max=158752, per=98.82%, avg=27358.75, stdev=16289.59, samples=302
   iops        : min= 3746, max=39688, avg=6839.69, stdev=4072.40, samples=302
  lat (nsec)   : 500=0.01%
  lat (usec)   : 10=0.01%, 20=9.76%, 50=9.17%, 100=1.51%, 250=77.24%
  lat (usec)   : 500=2.24%, 750=0.05%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=1.30%, sys=1.92%, ctx=1056478, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=27.0MiB/s (28.3MB/s), 27.0MiB/s-27.0MiB/s (28.3MB/s-28.3MB/s), io=4096MiB (4295MB), run=151501-151501msec

VM:

Code:

VM: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process

VM: (groupid=0, jobs=1): err= 0: pid=73405: Sat Jan 27 09:55:33 2024
  write: IOPS=37.2k, BW=145MiB/s (152MB/s)(4096MiB/28204msec); 0 zone resets
    slat (nsec): min=350, max=1166.4k, avg=2747.72, stdev=5552.23
    clat (nsec): min=180, max=1978.4k, avg=23590.66, stdev=22058.62
     lat (usec): min=5, max=2147, avg=26.34, stdev=23.50
    clat percentiles (usec):
     |  1.00th=[   16],  5.00th=[   17], 10.00th=[   18], 20.00th=[   18],
     | 30.00th=[   19], 40.00th=[   19], 50.00th=[   20], 60.00th=[   22],
     | 70.00th=[   23], 80.00th=[   24], 90.00th=[   29], 95.00th=[   40],
     | 99.00th=[   97], 99.50th=[  137], 99.90th=[  285], 99.95th=[  383],
     | 99.99th=[  816]
   bw (  KiB/s): min=69216, max=196040, per=99.89%, avg=148543.00, stdev=31985.30, samples=56
   iops        : min=17304, max=49010, avg=37135.75, stdev=7996.32, samples=56
  lat (nsec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 10=0.02%, 20=55.47%, 50=40.92%, 100=2.63%
  lat (usec)   : 250=0.80%, 500=0.11%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.01%
  cpu          : usr=9.58%, sys=20.97%, ctx=1049686, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=145MiB/s (152MB/s), 145MiB/s-145MiB/s (152MB/s-152MB/s), io=4096MiB (4295MB), run=28204-28204msec

Disk stats (read/write):
  sdb: ios=50/429009, merge=0/394211, ticks=4/116029, in_queue=116032, util=22.54%

UdoB · Jan 27, 2024

RobBM said:
Would you recommend switching to different RAID for better performance?

Well.., yes.

You would have to destroy(!) LOCAL as there is no way to live-convert the type of the vdev. Probably you also need to <node> --> Disks --> Wipe Disk to make them officially re-usable as creating new ZFS Pools will only accept actually empty drives.

Then re-create it as "striped mirrors" aka "Raid10". The capacity would drop to below two drives. (And keep in mind that it should not get filled above 80% to 90%; if you store VMs there this becomes 50% to 70%.) But the IOPS would double from same-as-one-drive to same-as-TWO-drives.

If that's worth it you need to decide. Copying large amounts of data on the same physical disks is not a usual task for me. (In my use cases I would move or link them.)

Good luck

RobBM · Jan 27, 2024

Thank you for the advice. I'll decide whether to move with different RAID or not. Striped mirrors sounds good but having only only 50% of two drives is pretty steap price to pay for perfomance. I tried finding some info about the need to not fill up the ZFS pool above certain % but came up empty. Can you point me to a right direction where I can find some info about this?

Just to clarify, I initially transferred some data to a VM but had to change the underlying OS so I attached the disk with data to a 2nd VM and wanted to consolidate the data from both of them on a singular disk - which is why I wanted to move it locally from disk A to disk B. I assumed it'd be quick process (1 TB) since it's all on the same ZFS and was surprised with speeds avereging 30-40 MB/s

UdoB · Jan 28, 2024

RobBM said:
I tried finding some info about the need to not fill up the ZFS pool above certain % but came up empty. Can you point me to a right direction where I can find some info about this?

Well... a quick search gave me no definitive links for my "50%" statement, sorry. (Also we can not search for "50%" in this forum...)

"For good IOPS/block storage performance, keeping below 50% full is a recommendation." -- www.truenas.com/community/threads/benchmarking-zfs-performance-on-truenas.91684/post-635063

"As a general rule of thumb, at about 50% capacity your pool will be noticeably slower than it was when it was 10% capacity. At about 80%-96% capacity, your pool starts to become very slow,..." -- https://www.servethehome.com/an-introduction-to-zfs-a-place-to-start/

"Keep pool capacity below 80% for best performance" -- https://docs.oracle.com/cd/E23823_01/html/819-5461/zfspools-4.html

Of course a ZFS Pool will work with a usage usage above 90%. But the speed will drop exponentially, starting somewhere far below that...

Search

Search

VMs very slow speed between disks

RobBM

New Member

Attachments

UdoB

Famous Member

RobBM

New Member

UdoB

Famous Member

RobBM

New Member

UdoB

Famous Member