I recently re-setup my workstation (Ryzen 5900X, 64GB RAM) with Proxmox and a primary Windows 11 VM with GPU (GTX 1080) passthrough. While most things seem to work just fine overall, I have been having some trouble getting reliable throughput to the underlying disks.
There are 3 disks: 1 NVMe and 2 SATA SSDs. The former hosts both Proxmox and the Windows 11 VM, while the latter is a mirrored ZFS device being forwarded as a VirtIO SCSI device. I have noticed that when transferring data to the mirrored VDEV, the device is able to sustain the full bandwidth for a while and then Windows reports it as 0MB/s for some time. This then will continue on and off regularly during that whole transfer. During this time, the performance monitor in Windows reports the disk load being 100%, despite transferring at 0MB/s. Also at this time, if you try to pause the transfer nothing will happen until Windows would start to see a real transfer happening again. Proxmox, however, apparently continues to report the full bandwidth during this whole transfer. I have added some pictures from both systems; the Windows image shows how the transfer stalls almost at the end (it happens to say 21 MB/s, but it stays there with no progress; other times it manages to jump to 0MB/s), having written about 5GB, but the disk is reporting 100% utilisation. The corresponding Proxmox graph on the other hand shows no sign of downgrading. This was a limited test transfer, but the second Proxmox graph (the one with the RAM usage included) is from a larger transfer I did yesterday. There it seems like the bandwidth is reported at full speed until there is a sudden spike of several GB/s (might have been that the NVMe had some traffic at that specific point, but nothing else was happening on the system then).
Some quick background on the transfers above, in case the rates seem low: I have been copying loads of RAW images off of a couple of SD cards. These are capped at about 250 MB/s, which is why the transfer rates are not higher than that. While the SSDs are SATA drives, they should be able to keep up with that kind of speed. Also, I have seen the same behaviour on some 90MB/s SD cards as well.
I have not been able to determine wherein the problem lies, but it feels like there is either some problem with caching or reporting of IO operations to the Windows VM. So either there is a cache/buffer that is filled up at first, then being emptied, and at which point Windows sees that as 0Mbit transfer since nothing new is getting out. Or there is some stalling happening at the IO reporting layer, where the VM does not get a successful write confirmation and therefore waits until that arrives. But that is just a feeling. In reality I have probably just misconfigured something. Any thoughts on what I might be missing?
Thanks in advance!
There are 3 disks: 1 NVMe and 2 SATA SSDs. The former hosts both Proxmox and the Windows 11 VM, while the latter is a mirrored ZFS device being forwarded as a VirtIO SCSI device. I have noticed that when transferring data to the mirrored VDEV, the device is able to sustain the full bandwidth for a while and then Windows reports it as 0MB/s for some time. This then will continue on and off regularly during that whole transfer. During this time, the performance monitor in Windows reports the disk load being 100%, despite transferring at 0MB/s. Also at this time, if you try to pause the transfer nothing will happen until Windows would start to see a real transfer happening again. Proxmox, however, apparently continues to report the full bandwidth during this whole transfer. I have added some pictures from both systems; the Windows image shows how the transfer stalls almost at the end (it happens to say 21 MB/s, but it stays there with no progress; other times it manages to jump to 0MB/s), having written about 5GB, but the disk is reporting 100% utilisation. The corresponding Proxmox graph on the other hand shows no sign of downgrading. This was a limited test transfer, but the second Proxmox graph (the one with the RAM usage included) is from a larger transfer I did yesterday. There it seems like the bandwidth is reported at full speed until there is a sudden spike of several GB/s (might have been that the NVMe had some traffic at that specific point, but nothing else was happening on the system then).
Some quick background on the transfers above, in case the rates seem low: I have been copying loads of RAW images off of a couple of SD cards. These are capped at about 250 MB/s, which is why the transfer rates are not higher than that. While the SSDs are SATA drives, they should be able to keep up with that kind of speed. Also, I have seen the same behaviour on some 90MB/s SD cards as well.
I have not been able to determine wherein the problem lies, but it feels like there is either some problem with caching or reporting of IO operations to the Windows VM. So either there is a cache/buffer that is filled up at first, then being emptied, and at which point Windows sees that as 0Mbit transfer since nothing new is getting out. Or there is some stalling happening at the IO reporting layer, where the VM does not get a successful write confirmation and therefore waits until that arrives. But that is just a feeling. In reality I have probably just misconfigured something. Any thoughts on what I might be missing?
Thanks in advance!