Windows VMs transfer speeds drop to 0 to/from HDD zfs mirrored pool

debsque · Mar 2, 2025

Hello,

When I'm using Windows 10/11 VMs, stored on an SSD zfs mirror pool, to transfer files to/from zfs HDD mirrored pool on the same Host, speeds frequently drop to 0 and hang there for a few minutes.

I don't have this issue with Ubuntu and linux VMs so I'm guessing it has something to do with Windows virtio drivers and zfs cache on the HDDs, but I can't put my finger on it. My solution at the moment is to rate limit the virtio network driver to 180 MB/s and the virtio hdd driver (for disks on HDD pool) to 100 MB/s.

There are several forum and reddit posts on the topic, but my situation seems a bit different.

I'm testing two scenarios:

(1) Transfers inside Windows VM:

Boot (C drive) on SSD pool
first test drive on HDD pool
second test drive on SSD pool

I'm transferring about 50 GB consisting of several windows and linux isos (literally), mostly files between 1 and 10 GB.

Transfers between the disks stored on the SSDs work mostly within expected parameters. Transfers between anything SSD and HDD, hangs. When I rate limit the HDD test drive to 100 MB/s via the Proxmox interface, speeds fluctuate between 50 si 80 MB/s but rarely drop to 0 and even when they do drop, they recover quickly.

(2) Using Windows VM to copy from Samba LXC which serves files (using bind mounts) from HDD pool (on the same host)

This works even worse. VMs have initial speeds of 200-300-400MB/s then drop to 0 and hang there for over a minute. Sometimes the transfer completes with fluctuating speeds of above 150 MB/s, but other times it just goes from 0-400-0 and just hangs frequently. My solution is to rate limit the virtio network controller to about 180 MB/s.

Hardware setup:

2 x WD SSD - rpool mirror
2 x Samsung EVO 860 2TB - zfs mirror for VM storage pool (called vmstorage)
2 x 8TB WD Red Pro - zfs mirror data storage pool (called tank)
CPU Intel Silver 4314 @ 2.4Ghz
128 GB RAM (zfs arc limited to 16GB)
Dell X520-DA2 10 Gbit optical transciever
Proxmox 8.3.4 - VMs and everything are up to date

The logical setup is something like this:

data folders stored on on tank (mirrored WD RED 8TB)
VMs stored on vmstorage (mirrored EVO 2 TB)
debian lxc container has bind mounts to folders on tank
lxc container is samba server
VMs access lxc container via samba => VMs access folders on tank via Samba

The VMs are fresh Windows 10 ltsc and Windows 11 LTSC

CPU = x86-64-v2-AES with flags md-clear, pcid, spec-ctrl,ssbd,aes
RAM = 16GB no baloon
Machine Q35 OVMF BIOS
SCSI VirtiO SCSI Single
HDD= scsi0

I've tried every combination I found on the forum, without results, such as:

changing CPU to Host, Kvm, different flags, different architectures
changing different disk flags such as SSD emulation, native Async IO, Discard, IO Thread (didn't modify cache though)
Changing machine type
reinstalling windows and using older virtio drivers
Virtio SCSI, Virtio SCSI Single, virtio block storage instead of scsi
NIC multiqueues
storing VM on LVM-Thin instead of ZFS (detached one disk in vmstorage mirror just to check the
storing VM on rpool mirror

The only settings that had a visibile effect, was setting zfs sync=disabled. The speeds still fluctuated a lot, but it didn't hang at 0 anymore.

Some other observations:

setup new TrueNAS VM on this host, with 2 mirror virtio disks stored on host's tank zfs pool - no issues but speeds never went above 150MB/s
transfers from another TrueNAS host to the Windows VM works normally, switching direction the Windows VM pushes only 20 MB/s
Ubuntu 24 VM would work at sustained transfers over 200 MB/s without crashing, only the occasional slowdowns
other file transfers work without interruption (ie via Linux Guets) while the Windows one crash
I can't find hints in the logs and it doesn't look like resources reach 100% utilization on CPU or RAM. The Windows Task manager shows 100% active time while the transfer stall

So, I can summarise that generally Windows VM is acting weird while copying files.

_gabriel · Mar 2, 2025

Be sure to use Windows virtio scsi drivers v266 as prior has problems.
+

E

[SOLVED] Post in thread 'Windows VM I/O problems only with ZFS'

Nov 14, 2023

I had a similar issue with my ZFS SSDs, even causing my Proxmox host to crash because the host SSD was also ZFS. This happened when a VM caused high write rates that couldn't be synced due to the SSD cache being full.

Under Windows, I observed a pattern like a loop (3000 MB/s for 30 seconds, then 0 MB/s for 60 seconds), etc. These zero-transfer periods likely caused the Proxmox host SSD to crash the entire system.

What I changed on my Proxmox Host:

Permanently:

In /etc/modprobe.d/zfs.conf, I added:

Code:

options zfs zfs_dirty_data_max=67108864

Or temporarily:

Code:

echo...

debsque · Mar 2, 2025

_gabriel said:
Be sure to use Windows virtio scsi drivers v266 as prior has problems.
+

E

[SOLVED] Post in thread 'Windows VM I/O problems only with ZFS'

Nov 14, 2023

I had a similar issue with my ZFS SSDs, even causing my Proxmox host to crash because the host SSD was also ZFS. This happened when a VM caused high write rates that couldn't be synced due to the SSD cache being full.

Under Windows, I observed a pattern like a loop (3000 MB/s for 30 seconds, then 0 MB/s for 60 seconds), etc. These zero-transfer periods likely caused the Proxmox host SSD to crash the entire system.

What I changed on my Proxmox Host:

Permanently:

In /etc/modprobe.d/zfs.conf, I added:

Code:

options zfs zfs_dirty_data_max=67108864

Or temporarily:

Code:

echo...

eclipse10000

Thanks for the suggestions.

I've tried both the newest virtio drivers and several older ones.

The dirty_cache thing doesn't do anything and it also doesn't apply in my case. I don't have issues with transfers between SSDs, I only have them while transfering to/from HDD-based zfs pools. I'll probably try an LSI card next weekend but I doubt it's that since I don't have issues at all when using Linux and no issues while using disks from the SSD pool under Windows

debsque · Mar 4, 2025

It seems that creating a sata disk (instead of scsi or virtio) stored on the tank pool (HDD-based zfs mirror) eliminates the hangs when copying files between different disks inside the same Windows VM.

In scenario (1), (which is copying files from a SAMBA server from an LXC container with bind mounts pointing to folders on tank), the problem is still there even when copying to the sata disk. I've changed ethernet to e1000 and controller to LSI but didn't make a difference. Only way to avoid drops and hangs while copying over the "network" is to set the network limit to 200 MB/s.

This also resonates with another post from your thread regarding better sata controller stability:

EdoFede said:
In the meantime I've ended 3 other test.

Created/installed 3 new VM identically configured, except for the storage controller.

IDE Controller (slow, but no issue):
View attachment 58078

SATA Controller (slow, but no issue):
View attachment 58079

SCSI Controller but with older Windows drivers (fast but with the issue):
View attachment 58080

In conclusion, it seems that when che "not ideal" controller limits the VMs I/O, the issue is not present because the disks are not saturated.

The hypothesis and suggestion of eclipse10000 in post #8 seem more and more likely. I'll test soon and let you know

Thanks!

_gabriel · Mar 5, 2025

double check driver version of virtio scsi storage controller when you test it, from Windows Device Manager.
version after 208 and before 266 are known to hang, specially with CrystalDiskMark.
Upgrade with .msi can fail, and update driver from Windows Device Manager is required.

B

Post in thread 'Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"'

Nov 21, 2024

Just confirming the patch is indeed in v266.

A relevant thread in the driver repo: https://github.com/virtio-win/kvm-guest-drivers-windows/pull/1196#issuecomment-2490465751

A few more PRs to raise for further improvements, but this breaking issue should be fixed.

debsque · Mar 8, 2025

So far, it's got something to do with multiple networks on the same LXC container (I'm using several VLANs and bridges). The logs don't tell anything meaningful and it's fairly easy to reproduce. When I remove the network interfaces from the config, or use "disconnect" from the GUI, the speeds remain constant (Linux and BSD still don't present this issue).

I went through several tests to arrive at this conclusion.

(1) connected 2 identical but new HDDs via a different HBA

created new zfs pool on the new HDDs via Proxmox WebUI
created folder on new pool and used bind mounts to LXC container
=> issue manifests as usual

(2) passed the HBA controller to fresh TrueNAS VM

imported pool, created shares
=> issue gone

(3) created new ubuntu 24.04 LXC (the original is debian based)

copied over the same bind mounts from the other container, standard samba configuration, but only 1 network card
=> no issues
joined samba server to domain, copied smb.conf from the other LXC => still no issues
- wanted to check for a samba server config issue
added rest of virtio network interfaces => issue manifests

(4) removed the network cards from the original LXC container, issues gone
(5) disconnecting the extra network interfaces for the LXC container, via the WebUI, while Windows VM transfers files, makes the problem go away in real time
(6) added multiple network interfaces to the fresh TrueNAS VM, no slowdowns or issues

I think I'll save some time now and in the future, and just pass an HBA with the disks to an Ubuntu VM. I wasn't very comfortable with managing the storage on Proxmox to begin with, but the solution seemed elegant and worked for a few years.
Edit: I've found several threads with similar problems. My searches didn't lead me there because I thought it was a zfs issue, but it looks like an lxc/samba issue. Some reddit comment had the exact initial simptom (speeds dropping to 0 and hanging there).
https://forum.proxmox.com/threads/very-poor-lxc-network-performance.139701
https://forum.proxmox.com/threads/very-poor-lxc-network-performance.139701
https://forum.proxmox.com/threads/slow-smb-speed-beetween-windows-machines.42814/
https://forum.proxmox.com/threads/linux-samba-server-or-lxc-container.141177
https://forum.proxmox.com/threads/samba-under-pve-8-sometimes-extremely-slow.130116

debsque · Mar 16, 2025

In the end, I created a new VM, setup passthrough for the controller and migrated the settings off of the LXC container (with the samba server). It's been running for a week without issues and I'm happy with it. In conclusion, it sounds nice to bind mount folders from the Proxmox host to an LXC container, setup samba and serve to VMs, it works very fast when it does, but it's not worth the trouble fixing it when it breaks. If it's possible, use the passthrough method.

Other notes:

During the migration, I had to move some VMs > 100 GB, stored on CIFS Datastores connected to Proxmox via the same Samba LXC. It started throwing OOM errors. So, I guess even using linux can throw errors if you push it hard enough. I think I saw the OOM issue mentioned around the forum and it's fixable, but using a VM instead of LXC is simpler altogether;
I've tested this passthrough option with a Debian VM and an Ubuntu VM and performance was largely the same on both, and neither had issues with transfer speeds regardless of the guest VM used for the CIFS transfer

Search

Search

Windows VMs transfer speeds drop to 0 to/from HDD zfs mirrored pool

debsque

Renowned Member

_gabriel

Famous Member

[SOLVED] Post in thread 'Windows VM I/O problems only with ZFS'

debsque

Renowned Member

[SOLVED] Post in thread 'Windows VM I/O problems only with ZFS'

debsque

Renowned Member

_gabriel

Famous Member

Post in thread 'Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"'

debsque

Renowned Member

debsque

Renowned Member

We value your privacy