Moved Windows VM ZFS volume from spinners to SSD storage and lost performance, why?

Sep 13, 2022
69
9
8
Hi,

I have a (new) Proxmox system with:
  1. 2 x 2 TB Samsung PM9A3 M.2 SSD and
  2. 6 x Seagate Exos 7E8 4 TB spinning HDDs.
ZFS is setup like:
  1. The SSDs have ZFS mirror-0 pools (with Proxmox itself installed on rpool, and spool).
  2. The HDD hava a ZFS raidz3-0 pool (intended mostly as archive and backup storage, dpool).
Before setting up things, I tested memtest86+ for 24hours (yes, it is ECC memory), completely filled every disk (all enterprise class) constantly writing for 2 days or so (with crypto-checksumming files by a script) and other tests to ensure the hardware is OK.

My team mate installed Windows Server 2019 Standard Edition accidentally on disks (dpool) instead of SSD (spool). First we did not notice this mistake because some bandwidth test they did in Windows reported 450 MB/sec throughput, which I hink is very good and speaks for a working setup (I did not expect that much on disks). But today I notice that it is not on SSD and decided to change it.

I moved the virtual disk from dpool to spool [with [x] delete source) using Proxmox GUI.
It was my first live storage migration and I have to admit that it left me very excited with a big smile on my face not only that it is possible at all, but also how easy and fast it was. It took a couple of seconds to migrate 100 GB from disk to SSD and the guest Windows Server did not notice. It is really fun using Proxmox, I really like it :) But back to topic.

After migration they redo the bandwidth test in Windows VM:

Now reported only 370 MB/sec throughput, instead of 450 MB/sec as before.

So it got slower! I was surprised and I'm failing to find an explanation or even be able to fix it.

I did some basics tests:

Bash:
$ zfs create -V 100G dpool/st1
$ hdparm -t /dev/zvol/dpool/st1
Timing buffered disk reads: 10570 MB in  3.00 seconds = 3522.90 MB/sec

$ fio --ioengine=libaio --filename=/dev/zvol/dpool/st1 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio
Jobs: 1 (f=1): [W(1)][100.0%][w=376KiB/s][w=94 IOPS][eta 00m:00s]
write: IOPS=88, BW=354KiB/s (362kB/s)(20.7MiB/60042msec); 0 zone resets
   bw (  KiB/s): min=  128, max=  480, per=100.00%, avg=354.02, stdev=69.90, samples=119
   iops        : min=   32, max=  120, avg=88.50, stdev=17.47, samples=119
WRITE: bw=354KiB/s (362kB/s), 354KiB/s-354KiB/s (362kB/s-362kB/s), io=20.7MiB (21.7MB), run=60042-60042msec




$ zfs create -V 100G spool/st1
$ hdparm -t /dev/zvol/spool/st1
Timing buffered disk reads: 6578 MB in  3.00 seconds = 2192.51 MB/sec

$ fio --ioengine=libaio --filename=/dev/zvol/spool/st1 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio
Jobs: 1 (f=1): [W(1)][100.0%][w=53.8MiB/s][w=13.8k IOPS][eta 00m:00s]
write: IOPS=15.6k, BW=60.0MiB/s (63.0MB/s)(3660MiB/60001msec); 0 zone resets
   bw (  KiB/s): min=38240, max=67768, per=100.00%, avg=62528.98, stdev=6330.90, samples=119
   iops        : min= 9560, max=16942, avg=15632.24, stdev=1582.73, samples=119
WRITE: bw=60.0MiB/s (63.0MB/s), 60.0MiB/s-60.0MiB/s (63.0MB/s-63.0MB/s), io=3660MiB (3838MB), run=60001-60001msec

In fio, the SSD pool volume is faster in every aspect, yet the windows reports a slower speed.
Interestingly, hdparm is contrary (yes I know it is no good test usually, but here it matches HD Tune I think). The disks come up with ~230MB/sec each and on the RAID-Z3 hdparm reports ~3GB/sec (I think the theoretical max is 6 x 230 MB/sec = 1,4GB, so I think I measure some caching here).
For SSD, the Mirror has 2,1 GB/sec, each SSD 1,7 GB/sec, according to hdparm, at least not more as the theoretic maximum

What do I do wrongly?

Any tip, hint, link, FAQ pointer or anything appreciated!

Should I now really move back from M.2 SSD to spinning rust disks to gain performance back?


Background information:

I created dpool and spool with Proxmox GUI and defaults (compression on, ashift=12).
On spool, additionally I tried:
Code:
$ zpool set autotrim=on rpool
$ zfs set compression=off spool
Seems to have no significant effect on performance here. I even tried disable sync, but even that had no visible effect in Windows VM(!); on fio, howerver, a big one as expected:
Code:
 zfs set sync=disabled spool
  WRITE: bw=214MiB/s (225MB/s), 214MiB/s-214MiB/s (225MB/s-225MB/s), io=1286MiB (1349MB), run=6003-6003msec
zfs set sync=standard spool
  WRITE: bw=62.9MiB/s (65.0MB/s), 62.9MiB/s-62.9MiB/s (65.0MB/s-65.0MB/s), io=378MiB (396MB), run=6001-6001msec
Windows tool (HD Tune) reports just 370 MB/sec, in contrast to 450 MB/sec it had on spinning disks, regardless even if zsync is disabled or not.

Some system information:

Code:
$ zpool list -v -L
NAME            SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
dpool          21.8T   200M  21.8T        -         -     0%     0%  1.00x    ONLINE  -
  raidz3-0     21.8T   200M  21.8T        -         -     0%  0.00%      -    ONLINE
    sdf        3.64T      -      -        -         -      -      -      -    ONLINE
    sde        3.64T      -      -        -         -      -      -      -    ONLINE
    sdd        3.64T      -      -        -         -      -      -      -    ONLINE
    sdb        3.64T      -      -        -         -      -      -      -    ONLINE
    sdc        3.64T      -      -        -         -      -      -      -    ONLINE
    sda        3.64T      -      -        -         -      -      -      -    ONLINE
rpool           508G  13.5G   495G        -         -     1%     2%  1.00x    ONLINE  -
  mirror-0      508G  13.5G   495G        -         -     1%  2.64%      -    ONLINE
    nvme0n1p3   511G      -      -        -         -      -      -      -    ONLINE
    nvme1n1p3   511G      -      -        -         -      -      -      -    ONLINE
spool          1.24T  17.1G  1.23T        -         -     0%     1%  1.00x    ONLINE  -
  mirror-0     1.24T  17.1G  1.23T        -         -     0%  1.34%      -    ONLINE
    nvme0n1p4  1.25T      -      -        -         -      -      -      -    ONLINE
    nvme1n1p4  1.25T      -      -        -         -      -      -      -    ONLINE

Bash:
$ qm config 101
boot: order=ide0;ide2;net0;ide1
cores: 2
ide0: spool:vm-101-disk-0,size=100G
ide1: local:iso/virtio-win-0.1.225.iso,media=cdrom,size=519590K
ide2: local:iso/SSS_X64FRE_DE-DE_DV9.iso,media=cdrom,size=5217128K
machine: pc-i440fx-7.0
memory: 4096
meta: creation-qemu=7.0.0,ctime=1669974340
name: [...]
net0: virtio=EE:A7:57:CD:73:27,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=9e975beb-fa6d-4ef7-a57d-f466cddeb286
sockets: 1
vga: std
vmgenid: 4e3246f2-c6ca-4f14-b61f-ac591752d0c2
root@relipve:~#
 
Did you reboot the windows guest, and also mark the virtual drive as an SSD on VM config?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!