[SOLVED] Hyper-Threading vs No Hyper-Threading; Fixed vs Variable Memory

Hi. I have a question for Proxmox 4.4.

There is the option "Ballooning" which is available with the option "Use fixed size memory". In the help manual says :

Even when using a fixed memory size, the ballooning device gets added to the VM, because it delivers useful information such as how much memory the guest really uses. In general, you should leave ballooning enabled, but if you want to disable it (e.g. for debugging purposes), simply uncheck Ballooning.

I'm configuring a Windows server 2012 VM. I want to have a fixed size memmory, but I'm confused about this option. Should I enable it?
 
So it appears no one has said anything about this so I'll make a comment. If more people can back this up, even better.
I turned off memory ballooning on all VMs on our weakest node. After doing so, those VMs had performance like what they'd have on the stronger nodes. This may seem anecdotal but I used the recommendation and came up with better performance. It certainly would seem that the recommendations from 2014 are still valid.
 
So it appears no one has said anything about this so I'll make a comment. If more people can back this up, even better.
I turned off memory ballooning on all VMs on our weakest node. After doing so, those VMs had performance like what they'd have on the stronger nodes. This may seem anecdotal but I used the recommendation and came up with better performance. It certainly would seem that the recommendations from 2014 are still valid.
Interesting findings. Do you have more information on how the performance was improved? What % difference are we talking?
 
Honestly, I know everyone always wants to see some sort of metric but without context, that would be no better than standard anecdotal evidence. I've now taken all of our KVM VMs off of ballooning and they are all noticeably faster. I know this is anecdotal and perceptual but believe you me, that matters greatly to our customers and therefore to us. We use RDP to sign in through the cloud to all these servers and the sign in is now twice as fast as it was. As far as metrics, I'd say the ones originally posted to this thread years ago would probably still be valid from the performance increase I'm feeling.
 
Perhaps some helpful numbers ..
We have a Dell Server with dual 18 core processors that normally runs at a CPU load of 18 to 22 at this time of the morning (8:40am EST) and right now is only running at 14
We have a couple of other Dell servers with dual 12 core processors that usually run at about 12 to 14 on CPU load at this time of the morning that right now are only running 5 and 7
Significant return of CPU power back to the hardware and nothing else changed except getting rid of the ballooning
 
Thanks for the info. If I ever get some time I might do some testing myself, so far I haven't really seen a difference.
 
Is the correct deactivation with the balloon=0 option?

This feature is so interesting, but if it really consumes so much resource, it becomes unfeasible.
 
As @thiagotgc asked, for Windows guest VMs is it sufficient to set balloon=0 on the host ?
This way on the VMs I don't see anymore the Virtio Ballon Driver in Device Manager, but the Balloon Service is still running.
Do I need to stop/disable the Balloon Service or furthermore uninstall it inside the VM?
Is the correct deactivation with the balloon=0 option?

This feature is so interesting, but if it really consumes so much resource, it becomes unfeasible.
 
Last edited:
  • Like
Reactions: thiagotgc
I benchmarked the boot up time of 15 Win10 VMs, to include auto-login and a small powershell workload test script that records benchmark completion time for each VM. The VMs are set to auto-boot and the PVE node is rebooted to kick off each test. The benchmark completion times of all 15 VMs are averaged together. Results are so close, you could argue it's within the margin of error.
  • Ballooning Device enabled: average of 721 seconds to complete benchmark
  • Ballooning Device disabled: average of 737 seconds to complete benchmark
VMs are hosted on a ZFS pool with l2arc_rebuild=0, to eliminate persistent L2ARC variability within the test. This causes a lot of IO delay on boot. Since the VMs are clearly IO bottlenecked, this may explain why I'm not seeing much benefit to CPU performance. I wasn't able to uninstall the balloon driver from within Windows, since I'm rolling-back snapshots before each reboot. I would need to boot up the VM, which would increase boot-to-boot variability. So the ballooning service is still running on all VMs in the second "balloon=0" test. This could be another reason I didn't see improvement.

As usual, CPU metrics are green, and IO Delay is blue. The first bump is with ballooning enabled, and the second is with balloon=0.
1684856619967.png

Memory metrics
1684856646611.png

Virtual Machine Info:
  • Windows 10 21H2
  • No network connection
  • 1 socket, 2 cores
  • 8 GB ram
  • Virtio SCSI HDD, iothread=1, aio=io_uring
  • zVol volblocksize=64k
  • Balloon DriverVer=07/19/2017,100.74.104.14100
PVE Test Node:
  • PVE 7.0-11
  • RAM: 251.8 GiB
  • CPUs: 40 x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (2 Sockets)
  • Kernel Version: Linux 5.11.22-4-pve #1 SMP PVE 5.11.22-8
  • PVE Manager Version: pve-manager/7.0-11/63d82f4e
 
I upgraded to PVE 7.4-3, enabled persistent L2ARC and IO threads. With no more disk IO bottleneck, I could see some decent CPU utilization. I just had to conduct every test twice to ensure repeatability now that L2ARC is persistent. All 10 VMs were done booting and benchmarking after about 5 minutes, so I average the node's CPU usage over the first 10 minutes. Test 1 & 2 had ballooning enabled. Test 3 & 4 had ballooning disabled. With Test 5 & 6 I upgraded to the lastest drivers in virtio-win-0.1.229.iso.
  • Test 1 & 2 Average % CPU usage: 11.30%
    • Balloon Driver Date: 8/30/2021
    • Balloon Driver Version: 100.85.104.20800
  • Test 3 & 4 Average % CPU usage: 10.23%
    • VM Memory -> Balloon=0
    • VirtIO Balloon Device disabled
    • BalloonService disabled
  • Test 5 & 6 Average % CPU usage: 11.21%
    • Balloon Driver Date: 11/15/2022
    • Balloon Driver Version: 100.92.104.22900
Later on, I discovered the qm guest exec command can be used to toggle on/off the balloon driver in real time (no restart required!). I disabled both the VirtIO Balloon Device and "BalloonService" windows service just to be safe, but disabling either one would result in the VM's memory usage spiking up.

Disable Ballooning on VMID 123:
Bash:
qm guest exec 123 "pnputil.exe" "/disable-device" "PCI\VEN_1AF4&DEV_1002&SUBSYS_00051AF4&REV_00\3&267A616A&1&18"
qm guest exec 123 "net" "stop" "balloonservice"

Enable Ballooning on VMID 123:
Bash:
qm guest exec 123 "pnputil.exe" "/enable-device" "PCI\VEN_1AF4&DEV_1002&SUBSYS_00051AF4&REV_00\3&267A616A&1&18"
qm guest exec 123 "net" "start" "balloonservice"

Next, I looked at individual VM CPU usage while idle. I toggled on the balloon driver for 5 hours, and toggled it off for 5 hours. This was done for 4 VMs. The idle average % CPU usage increase was 1.12% when the balloon device was enabled.
1685651351612.png


Finally, I looked at VM CPU usage while under load. PowerShell was used to max out a single vCPU, while writing and releasing about 1.5GB of RAM over and over. This caused a pretty consistent 61% - 62% load. This test was smaller, ran on only 2 VMs over the course of 7 hours (3.5 hours on, 3.5 hours off). The under-load average % CPU usage increase was 0.35% when the balloon device was enabled.
1685651382329.png


Here's the Powershell one-liner that I used:
Code:
do{$str = "01234567";while(($str.length/1024) -lt (512*1024)){$str+=$str};Clear-Variable -Name "str"}while($true)

My conclusion was that the VirtIO balloon device had a negligible impact on performance (about 1% CPU utilization of a 2.3 GHz Xeon). By using the balloon device, I can save 2-3 GB of RAM per VM. Exceptions: I only tested Windows 10 VMs, not Windows Server. I'm also not using PCIe Passthrough, which is know to conflict with the Balloon device too.
 
Last edited:
Later on, I discovered the qm guest exec command can be used to toggle on/off the balloon driver in real time (no restart required!). I disabled both the VirtIO Balloon Device and "BalloonService" windows service just to be safe, but disabling either one would result in the VM's memory usage spiking up.

By using the balloon device, I can save 2-3 GB of RAM per VM. Exceptions: I only tested Windows 10 VMs, not Windows Server. I'm also not using PCIe Passthrough, which is know to conflict with the Balloon device too.
I think it likely that the balloon drivers helps Proxmox differentiate between used memory and memory used for filesystem cache (which is often counted as unused memory because it can be freed and reused instantly). Therefore, I think the ballooning does not save you memory, it just counts it differently, but it is still a good thing.
 
  • Like
Reactions: TauriRed
I think it likely that the balloon drivers helps Proxmox differentiate between used memory and memory used for filesystem cache (which is often counted as unused memory because it can be freed and reused instantly). Therefore, I think the ballooning does not save you memory, it just counts it differently, but it is still a good thing.
I would definitely like to get smarter on this subject, because I observed behavior similar to what you're saying here. When looking at the VM summary, I can see the 2-3 GB of savings. But when looking at the PVE node summary, the RAM savings don't translate. I thought this might be ZFS eating the spare RAM, but even after reboot (when the ARC is being completely rebuilt) the utilization looked very similar to when Ballooning is disable.

Somebody took the time to develop and maintain the Ballooning Device, so I have to assume it has a purpose. The common understanding is that it's used to increase VM density by effectively "thin-provisioning" RAM. Would be nice for the PVE node summary page to add 2 more RAM usage stats, to see if this is actually happening:
  • VM RAM usage- The amount of RAM actually being consumed by VMs.
    • Helps differentiate between VM and system process RAM usage (like ZFS)
  • VM RAM allocated- The total amount of RAM allocated to VMs in their virtual hardware configuration
    • Would show the benefit of enabling the Balloon device, and help admins not to go too crazy with over-allocating RAM
 
IO threads.
Any specific reason you had that disabled at first place? Noticed if you choose for SCSI Controller, SCSI Controller Single, it auto-enabled it by default.
i ve also read that single has better performance without mentioning why though.
 
For all of you running win VMs (Specially servers), which option do you use for cache? : write back , write through or none?
 
which option do you use for cache?

"None". This gives max reliability, saves Ram by not double storing data in Ram and does not lie to the guest os.

My storage is ZFS. It has its own optimization for writing async data.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!