Understanding CPU Usage and Server Load

uberdome

Member
Mar 19, 2019
21
1
8
I am looking at the graphs that show "CPU usage" and "Server load". It was my understanding that Server Load represents how many things are waiting to be processed. Conceptually, I picture as long as CPU usage is less than 100%, Server Load would be under 1. This is clearly not the case, though.

Can someone explain to me why my CPU Usage would be <50 while my server load is peaking around 3? What does this mean and what, if anything, do I need to do to to improve it?

[I have a one-node Proxmox setup that I am reviewing and attempting to understand. I have been using Proxmox for almost 2 years, but in a limited capacity. I'm reviewing some of my setups and trying to get some possible issues sorted.]

Here is an image of "Week (maximum)" for the node:
Annotation 2020-07-28 191426.png
 
Last edited:

uberdome

Member
Mar 19, 2019
21
1
8

Thank you, this is a very helpful resource. I have a better understanding of my previous misconceptions.

I thought Server Load was only related to CPU usage, and was load relative to 1 being maxed out. I now understand that the CPU usage components of Server Load is effectively per core (i.e., if you have 4 cores, the CPU usage component of Server Load would be maxed out at 4, not at 1). I also now understand that the Server Load can be higher than CPU count and still be functioning normally, as it takes into account other metrics.

It is a bit hard to gain information from that Server Load metric. I feel like it does not do a good job of representing actual bottlenecks in the system (i.e., it does not reveal where a problem is, just that load may have increased or decreased or be increasing or decreasing.). I think it would be more useful to see individual system performance metrics to be able to identify bottlenecks. Perhaps I am just inexperienced in this areas, though.

Anyhow, My Server Load maximums are still 25% lower than my core count, so I am more comfortable knowing I am not currently exceeding the capacity of my system. I also know this can go over my core count, but it is unclear how much or when this value represents a problem.

How can one identify when they are actually exceeding the capacity of their system?
 

LnxBil

Famous Member
Feb 21, 2015
6,059
740
133
Germany
How can one identify when they are actually exceeding the capacity of their system?

For CPU, monitor the utilization of your cores. If that reaches 100%, you're CPU bound.
More critical is I/O bottlenecks, because your CPU will actively wait until a request is done and can block processes in D state if you have a problem with your storage. Best is here to monitor I/O delay. These two metrics are already included in PVE.

The load value is only a rough estimate and can be falsely high, e.g. with a lot of processes in D state that other processes actively wait for. You also need to monitor the swapin/swapout to have a good memory metric. Swapin is ok, but too much swapin/swapout is bad, then you have memory problems.

In general is Brendan Gregg's website and his book always worth a read.
 
  • Like
Reactions: fabian

uberdome

Member
Mar 19, 2019
21
1
8
For CPU, monitor the utilization of your cores. If that reaches 100%, you're CPU bound.
More critical is I/O bottlenecks, because your CPU will actively wait until a request is done and can block processes in D state if you have a problem with your storage. Best is here to monitor I/O delay. These two metrics are already included in PVE.

The load value is only a rough estimate and can be falsely high, e.g. with a lot of processes in D state that other processes actively wait for. You also need to monitor the swapin/swapout to have a good memory metric. Swapin is ok, but too much swapin/swapout is bad, then you have memory problems.

In general is Brendan Gregg's website and his book always worth a read.

Can you point out a method for monitoring utilization of cores within Proxmox? If that is not possible, am I just looking for some SNMP OIDs to connect to our network monitoring, or something else entirely?
 

LnxBil

Famous Member
Feb 21, 2015
6,059
740
133
Germany
Can you point out a method for monitoring utilization of cores within Proxmox? If that is not possible, am I just looking for some SNMP OIDs to connect to our network monitoring, or something else entirely?

PVE is not different than any other Linux or Unix for that matter.
I'd go with the external metrics server and do the monitoring over the values.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!