Demystifying Load Averages

Jan 25, 2022
45
10
13
Ireland
Hi All,

I am hoping to get some insight into why my cluster load averages are fairly high despite low CPU Loads & I/O Wait.

Basic Overview: 3x Dell R740xD; Dual Xeon Gold 6138 (2x 20/40 = 80c), 10x32GB 2666MHz DDR4
Storage Type: Ceph (hyper converged)

Here's a snapshot from my Grafana dashboard (time interval is 60s). This represents a typical load. Lately, a 'high load' would be ~4-5% CPU & ~2-3% I/O Wait.

pr.combine.JPG

I understand the basic concept of Load Averages - it being a ratio against total system cores. But it seems that I'm rapidly approaching the 80 average ceiling that I don't want to surpass while still have a ton of system resources available.

I have taken a dive into htop and I only have a couple VMs that are real heavy-hitters in terms of # processes & utilization.

Are there other factors to take into account? How can I better interpret these numbers? Or are load averages simply a defunct metric that I shouldn't worry too much about? I've read both sides of the argument, but am still not sure how to interpret/diagnose/improve based on these numbers.

Any help would be much appreciated. I'm happy to supply more info if needed. Thanks
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!