Hello,
When I'm using Windows 10/11 VMs, stored on an SSD zfs mirror pool, to transfer files to/from zfs HDD mirrored pool on the same Host, speeds frequently drop to 0 and hang there for a few minutes.
I don't have this issue with Ubuntu and linux VMs so I'm guessing it has something to do with Windows virtio drivers and zfs cache on the HDDs, but I can't put my finger on it. My solution at the moment is to rate limit the virtio network driver to 180 MB/s and the virtio hdd driver (for disks on HDD pool) to 100 MB/s.
There are several forum and reddit posts on the topic, but my situation seems a bit different.
I'm testing two scenarios:
(1) Transfers inside Windows VM:
Transfers between the disks stored on the SSDs work mostly within expected parameters. Transfers between anything SSD and HDD, hangs. When I rate limit the HDD test drive to 100 MB/s via the Proxmox interface, speeds fluctuate between 50 si 80 MB/s but rarely drop to 0 and even when they do drop, they recover quickly.
(2) Using Windows VM to copy from Samba LXC which serves files (using bind mounts) from HDD pool (on the same host)
This works even worse. VMs have initial speeds of 200-300-400MB/s then drop to 0 and hang there for over a minute. Sometimes the transfer completes with fluctuating speeds of above 150 MB/s, but other times it just goes from 0-400-0 and just hangs frequently. My solution is to rate limit the virtio network controller to about 180 MB/s.
Hardware setup:
Some other observations:
When I'm using Windows 10/11 VMs, stored on an SSD zfs mirror pool, to transfer files to/from zfs HDD mirrored pool on the same Host, speeds frequently drop to 0 and hang there for a few minutes.
I don't have this issue with Ubuntu and linux VMs so I'm guessing it has something to do with Windows virtio drivers and zfs cache on the HDDs, but I can't put my finger on it. My solution at the moment is to rate limit the virtio network driver to 180 MB/s and the virtio hdd driver (for disks on HDD pool) to 100 MB/s.
There are several forum and reddit posts on the topic, but my situation seems a bit different.
I'm testing two scenarios:
(1) Transfers inside Windows VM:
- Boot (C drive) on SSD pool
- first test drive on HDD pool
- second test drive on SSD pool
Transfers between the disks stored on the SSDs work mostly within expected parameters. Transfers between anything SSD and HDD, hangs. When I rate limit the HDD test drive to 100 MB/s via the Proxmox interface, speeds fluctuate between 50 si 80 MB/s but rarely drop to 0 and even when they do drop, they recover quickly.
(2) Using Windows VM to copy from Samba LXC which serves files (using bind mounts) from HDD pool (on the same host)
This works even worse. VMs have initial speeds of 200-300-400MB/s then drop to 0 and hang there for over a minute. Sometimes the transfer completes with fluctuating speeds of above 150 MB/s, but other times it just goes from 0-400-0 and just hangs frequently. My solution is to rate limit the virtio network controller to about 180 MB/s.
Hardware setup:
- 2 x WD SSD - rpool mirror
- 2 x Samsung EVO 860 2TB - zfs mirror for VM storage pool (called vmstorage)
- 2 x 8TB WD Red Pro - zfs mirror data storage pool (called tank)
- CPU Intel Silver 4314 @ 2.4Ghz
- 128 GB RAM (zfs arc limited to 16GB)
- Dell X520-DA2 10 Gbit optical transciever
- Proxmox 8.3.4 - VMs and everything are up to date
- data folders stored on on tank (mirrored WD RED 8TB)
- VMs stored on vmstorage (mirrored EVO 2 TB)
- debian lxc container has bind mounts to folders on tank
- lxc container is samba server
- VMs access lxc container via samba => VMs access folders on tank via Samba
- CPU = x86-64-v2-AES with flags md-clear, pcid, spec-ctrl,ssbd,aes
- RAM = 16GB no baloon
- Machine Q35 OVMF BIOS
- SCSI VirtiO SCSI Single
- HDD= scsi0
- changing CPU to Host, Kvm, different flags, different architectures
- changing different disk flags such as SSD emulation, native Async IO, Discard, IO Thread (didn't modify cache though)
- Changing machine type
- reinstalling windows and using older virtio drivers
- Virtio SCSI, Virtio SCSI Single, virtio block storage instead of scsi
- NIC multiqueues
- storing VM on LVM-Thin instead of ZFS (detached one disk in vmstorage mirror just to check the
- storing VM on rpool mirror
Some other observations:
- setup new TrueNAS VM on this host, with 2 mirror virtio disks stored on host's tank zfs pool - no issues but speeds never went above 150MB/s
- transfers from another TrueNAS host to the Windows VM works normally, switching direction the Windows VM pushes only 20 MB/s
- Ubuntu 24 VM would work at sustained transfers over 200 MB/s without crashing, only the occasional slowdowns
- other file transfers work without interruption (ie via Linux Guets) while the Windows one crash
- I can't find hints in the logs and it doesn't look like resources reach 100% utilization on CPU or RAM. The Windows Task manager shows 100% active time while the transfer stall