Trouble migrating from VMware

supercilious

New Member
Jan 1, 2019
6
0
1
44
Hi,

I'm trying out proxmox in an effort to ditch VMware ESXi as my little home-lab hypervisor. Everything works fine, but I can't seem to figure out the settings that allow memory overcommit ratios like those on ESXi.

I'm running on an HP z420 workstation as my 24x7 server at home with 64GB of ECC RAM. Using ESXi I can typically run a little over 100GB worth of vRAM in my virtual machines without swapping due to the built-in memory ballooning, deduplication and compression of ESXi. Since almost all my virtual machines are clones of the same image, they have very high ratios of dedup. Using proxmox however, I start swapping heavily running anywhere near those kinds of workloads, so much so that it is unusable, even with fast NVME SSD swap. My workload is very bursty and most of the VMs are idle most of the time so VMware ballooning and memory dedup and compression work perfectly for me.

How can I get similar results on Proxmox?
 
On memory settings of each guest, you can configure an upper an a lower value of ram. Those will be used to auto balloon when ram pressure reaches 80% on the host. You should also check ksm is running, and maybe tune it a bit (ksmtuned.conf)
 

Thanks for responding. I have tried all those things before posting here. Ballooning is working fine with the VirtIO drivers in my windows guests pushing idle VMs to their minimum memory, and [ksmd] is running on the host. However, I cannot get memory compression working; as far as I can tell, there is no proper support in proxmox? (The zram/zswap results on google are very outdated, and proxmox + zram yields only one result and that is related to LXC, which I'm not using.)

My machine is only usable until I reach around 80 - 85GB of vRAM, after that its in swap hell. Despite tuning ksm the ksmd kthread is idle and memory usage is pushing the machine into a swap storm. ESXi just chugs along perfectly fine, and even Hyper-V didn't suffer such a huge performance degradation. :(

Is there anything I've missed?
 
Yes, KSM is working and tuned to be extremely aggressive (I have plenty of CPUs cycles spare). If I leave the machine idle for a couple of hours, the memory usage goes down a little, but as far as I can tell, KSM just isn't as effective as it is supposed to be. Looking at a perf trace of the kernel clearly shows that KSM is working, but memory usage is not shrinking beyond a certain point.
 
There was a lot of memory allocation calls in the kernel perf trace so I looked deeper, and it turns out that KSM itself is the memory hog!!!

A look at https://github.com/torvalds/linux/blob/master/mm/ksm.c shows why... KSM is using a silly implementation of multiple red-black trees instead of a simple hash table to find identical pages. There are gigabytes of KSM tree nodes allocated inside the kernel :(
 
I can't get what's your Problem here. Ive currently running multiple PVE Clusters, on of them has currently 3 Dell R610 with 96GB of RAM and around 70 VMs per Node which have all different Workloads (Customer VMs). All of these Nodes have around 16 - 20GB KSM active, i dont have any Problems with KSM yet.

AFAIK KSM will start at around 80% Memory Usage, but this need some time to merge the Pages, if you raise it to hard it might be a Problem for KSM.

Do you have the QEMU Guest Agent enabled?
 
AFAIK KSM will start at around 80% Memory Usage, but this need some time to merge the Pages, if you raise it to hard it might be a Problem for KSM.

No, KSM is always running according to the value of /sys/kernel/mm/ksm/pages_to_scan and /sys/kernel/mm/ksm/sleep_millisecs. Ballooning starts at 80% host memory usage, from what I can tell. There is no memory compression in proxmox.

Yes I have ballooning working just fine with the VirtIO drivers installed; it isn't enough on its own to keep physical memory down when several VMs are active.

On my tiny host with only 64GB of host RAM the KSM trees occupy loads of RAM inside the kernel and fail to trim my VMs much (Windows is doing aggressive ASLR and stuff so its not helping matters), and there is no memory compression. Additionally, the swap I/O is not even compressed, so there is loads of extra I/O under proxmox compared to ESXi too when it starts swapping.

Basically, proxmox is not able to overcommit memory as high as ESXi it looks like, so I'm stuck using VMware :(
 
No, KSM is always running according to the value of /sys/kernel/mm/ksm/pages_to_scan and /sys/kernel/mm/ksm/sleep_millisecs.
using at least 80% of your physical memory on the host
https://pve.proxmox.com/wiki/Dynamic_Memory_Management#KSM_in_action

There is no memory compression in proxmox.
AFAIK you are correct here.

On my tiny host with only 64GB of host RAM the KSM trees occupy loads of RAM inside the kernel and fail to trim my VMs much (Windows is doing aggressive ASLR and stuff so its not helping matters), and there is no memory compression.
I'm working in a Datacenter and there we have an PVE Cluster with 256GB RAM in a Node. We are using heavily Windows VMs, so there was an KSM Sharing of around 180GB possible, so im really can't get what's your Problem here, because i have other experience.
But yes, windows is not doing a great job in Memory Management, but normally thats really not a big Problem.

You can not really compare VMware with PVE, both are great Plattforms and have his own feature sets and advantages. PVE is Open Source, VMware not, PVE is usable in every enviroment, VMware not really, its more for SMB or Datacenter and its not free.
But sure VMware has very nice features like vMotion what im missing here in PVE.

But you have not answered my question, do you have the QEMU GA installed and enabled? :)
 

Yes, I've seen that, but its not correct. ksmd is always running on pages with the MADV_MERGEABLE flag and limited only by the sleep interval and page scan rate as I stated in my previous post. I have disabled auto-tuning and enabled aggressive settings manually for testing purposes. It hasn't helped much.

Yes, I have installed the guest agent in some of my guest VMs. But many are running server core 2016, so those only have the virtIO drivers injected into the WIM image before deployment since the guest add-ons require the windows GUI to be installed AFAIK.

I'm very keep to move away from VMware as they are constantly crippling the free tier hypervisor and moving features into their vsphere offering, which is insanely expensive. I have been happily using ESX since the 3.x days and I've been quite happy, but the time has come to move on which is why I'm evaluating alternatives. At the moment though, the alternatives seem unable to cope with my existing workload :(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!