[SOLVED] Windows VM becomes unresponsive when copying files from TrueNAS VM

elnino54

New Member
Oct 29, 2024
3
1
3
Hi All,
I have an unusual problem with a Windows VM copying files from my TrueNAS VM.

Rough Config:
Xeon E5-2650 v4
64Gb RAM
Lsi 9211-8i SAS controller (HBA/IT Mode, Direct passed to TrueNAS)
6 X 1.2TB SAS SSD in ZFS Pool
2 X 250Gb SATA SSD in R1

NAS has been allocated 16Gb of ram, and I have left the default (which was consuming almost all of the ram in the VM, as well as tweaked the zfs_arc_max value to 8Gb in TrueNAS with no change.

The VMs themselves are just on LVM, single m.2 nvme

If I try to copy a file *from* the NAS to the windows 11 VM, the VM immediately freezes and will stay frozen until the end of the copy. If I was connected via RDP, that will also drop and reconnect.

Stats on the NAS show a transfer of ~700MB/s while the VM is 'frozen'.

A different VM is using the R1 array via NFS continues on fine during this freeze.

The same windows VM can copy the same file back *to* the nas with no issue at all, allbeit a bit slower.

This is similar to a lot of other posts but they seem to mention the proxmox server itself has issues and needs a reboot after this happens, but for me, the windows VM is just 'stunned' and then goes back to normal. The zfs_arc_max value seems to be real close to the mark, but it has not helped in my situation.

Any ideas?
 
Last edited:
I have an unusual problem with a Windows VM copying files from my TrueNAS VM.
Nothing is unusual when you virtualize stuff, that should run bare metal instead ;)
as well as tweaked the zfs_arc_max value to 8Gb in TrueNAS with no change.
I am not sure if I understand what you wrote, but 50% or 8GB in your case, is already the default for SCALE installations, even without tweaks.
The same windows VM can copy the same file back *to* the nas with no issue at all, allbeit a bit slower.
I think this here explains it well. Write performance of your TrueNAS isn't as good as your read performance.
Unlike your write, your read will run into a network bottleneck.
That explains why the Windows VM "freezes" (more likely, only your network connection to Proxmox webGUI is congested), RDP drops and why this does not happen for writes. The VM itself was probably never frozen.

I would suggest a bandwidth limit, or trying to find out why your Linux bridge(?) can "only" do 5,6Gbit/s.
 
Last edited:
I am not sure if I understand what you wrote, but 50% or 8GB in your case, is already the default for SCALE installations, even without tweaks.
Well, it was default, and only leaving about 100mb of RAM free, so I manually set it to 8Gb as per some other recommendations but it didn't help.

Unlike your write, your read will run into a network bottleneck.
I tend to agree, but at what point in the virtual infrastructure does it actually bottleneck (genuine question)? The data is technically not going out and back in the network interface (which is 10Gb BTW) as they are on the same linux bridge on the same host, so not technically limited by the NIC. That's certainly not how VMware behaves, but I understand apples/oranges.


more likely, only your network connection to Proxmox webGUI is congested
Maybe - I can test that theory as at the moment, the management is bound to the same nic, but I can split it out to the onboard 1Gb that is currently unused. But it does also drop an RDP connection to the windows VM.

As I typed this - I had an interesting thought, and that stemmed from my VMWare comment above. I checked the virtual NIC type on the VM, and it was E1000 (Which I don't think I ever selected specifically, and it's only on my single windows VM), but changing it from that to the VirtIO nic and installing the drivers has actually resolved the issue, well, umm, OK I guess! I can now copy at full speed, without losing connection to the VM.

This also reminds me of some of the TCP/IP offload stuff that gave me nightmares on Broadcom NICs back in the early 2000's.
 
Well, it was default, and only leaving about 100mb of RAM free, so I manually set it to 8Gb as per some other recommendations but it didn't help.
100mb of free RAM is a good thing!
Empty RAM is wasted RAM!

All modern OS do use empty RAM as cache in some form or another. They just hide it better. MS uses white for cache in the task manager, macOS only talks about "memory pressure".

I tend to agree, but at what point in the virtual infrastructure does it actually bottleneck (genuine question)?
My guess would be that linux (?) bridge in Proxmox? And that brige is probably (just another guess) CPU limited?
Ahh the E1000 emulation was the bottleneck :)

I checked the virtual NIC type on the VM, and it was E1000 (Which I don't think I ever selected specifically, and it's only on my single windows VM), but changing it from that to the VirtIO nic and installing the drivers has actually resolved the issue, well, umm, OK I guess! I can now copy at full speed, without losing connection to the VM.
Ahh that makes sense. Great you resolved it that way.
FYI: Proxmox often uses by default very, very conservative settings. Settings that are IMHO extremely conservative, but you never know if someone does not come with some exotic stuff around the corner. So they have the safety first approach. That is why they use a Intel E1000 by default, which of course really, really sucks.

So next time I suggest you follow the best practices guides.
https://pve.proxmox.com/wiki/Windows_11_guest_best_practices
This would have lead you with a virtio NIC and guest tools to begin with.
And even then, it is still conservative in other parts. Like for the CPU.
But that would be a whole other topic :)
 
Last edited:
  • Like
Reactions: Johannes S