Running out of RAM? VM Sluggish

John0212

New Member
Feb 2, 2024
11
1
3
We've recently deployed a new host with the following issue related specs:
30TB ZFS RAIDZ1 (8x 3.84TB Drives)
512GB DDR5-4800

I believe we have a RAM issue somewhere. We have 20 Windows VM's, each assigned 16GB of RAM, and occasionally a VM will hang and now allow any inputs and is extremely sluggish. During this time, our SWAP usage will shoot up to 100% of the 8GB size. Powering down a VM seems to help alleviate this issue and the impacted machine returns to normal. What I find interesting, and likely attributed to my lack of knowledge of some certain functions, is that we're only utilizing about 70% of our RAM. It never seems to go higher than this value. My understanding is that it shouldn't touch the SWAP unless we start to hit a ceiling of RAM but I could be mistaken.

I've applied these settings and have restarted the host. Anything else I should investigate? I'm a little lost at this point and not sure where to keep investigating.
VM-Swappiness = 10
ZFS_Arc_Max = 40GB
 

Attachments

  • Screenshot 2024-03-14 105715.png
    Screenshot 2024-03-14 105715.png
    125.4 KB · Views: 18
Hey,

how are the stats when opening htop and compare the ram usage with PVEs webgui?

Best
 
HTOP is about 25GB RAM usage higher than PVE GUI
 

Attachments

  • Screenshot 2024-03-14 112124.png
    Screenshot 2024-03-14 112124.png
    26.6 KB · Views: 12
If it helps, I have an identical server with the only difference being less RAM (384GB) that appears to be doing the same thing. We only have 9 Windows VM's running on this one but the usage is still pretty high and using SWAP.
 

Attachments

  • Screenshot 2024-03-14 112824.png
    Screenshot 2024-03-14 112824.png
    138.1 KB · Views: 11
  • Screenshot 2024-03-14 112836.png
    Screenshot 2024-03-14 112836.png
    23.5 KB · Views: 11
Just my 2 cents:

I notice you're using double-socketed servers.
I don't know your M/B hardware RAM configurations - as per RAM for each socket - but maybe this is linked to sluggishness of VM's
You'll probably want to check sockets assignation as per each VM - also NUMA settings.

I don't personally use any double-socketed HW - and I know (nearly) nothing about NUMA settings.
 
Can you check on command line?
Code:
$> free -h
1710783690909.png

Update: I have disabled SWAP. I read this weekend that I may be in an OK position to just disable that, so I've done that for now. The change won't stick post reboot so I can roll back if needed.
 
Last edited:
Just my 2 cents:

I notice you're using double-socketed servers.
I don't know your M/B hardware RAM configurations - as per RAM for each socket - but maybe this is linked to sluggishness of VM's
You'll probably want to check sockets assignation as per each VM - also NUMA settings.

I don't personally use any double-socketed HW - and I know (nearly) nothing about NUMA settings.
Smart thinking. I've heard about NUMA being impacted by this too. These are brand new R760 servers from Dell and the RAM looks to be positioned correctly. Here is a screenshot from the iDRAC.
1710785015138.png
 
try
numa_balancing=disable
as kernel boot options
Maybe first try this instead:
Code:
echo 0 > /proc/sys/kernel/numa_balancing
with current boot (non persistent after reboot) - and if it works for you - add it afterwards to kernel boot options
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!