Understanding CPU Usage and Server Load

uberdome

Member
Mar 19, 2019
25
2
23
I am looking at the graphs that show "CPU usage" and "Server load". It was my understanding that Server Load represents how many things are waiting to be processed. Conceptually, I picture as long as CPU usage is less than 100%, Server Load would be under 1. This is clearly not the case, though.

Can someone explain to me why my CPU Usage would be <50 while my server load is peaking around 3? What does this mean and what, if anything, do I need to do to to improve it?

[I have a one-node Proxmox setup that I am reviewing and attempting to understand. I have been using Proxmox for almost 2 years, but in a limited capacity. I'm reviewing some of my setups and trying to get some possible issues sorted.]

Here is an image of "Week (maximum)" for the node:
Annotation 2020-07-28 191426.png
 
Last edited:

Thank you, this is a very helpful resource. I have a better understanding of my previous misconceptions.

I thought Server Load was only related to CPU usage, and was load relative to 1 being maxed out. I now understand that the CPU usage components of Server Load is effectively per core (i.e., if you have 4 cores, the CPU usage component of Server Load would be maxed out at 4, not at 1). I also now understand that the Server Load can be higher than CPU count and still be functioning normally, as it takes into account other metrics.

It is a bit hard to gain information from that Server Load metric. I feel like it does not do a good job of representing actual bottlenecks in the system (i.e., it does not reveal where a problem is, just that load may have increased or decreased or be increasing or decreasing.). I think it would be more useful to see individual system performance metrics to be able to identify bottlenecks. Perhaps I am just inexperienced in this areas, though.

Anyhow, My Server Load maximums are still 25% lower than my core count, so I am more comfortable knowing I am not currently exceeding the capacity of my system. I also know this can go over my core count, but it is unclear how much or when this value represents a problem.

How can one identify when they are actually exceeding the capacity of their system?
 
  • Like
Reactions: GTA_doum
How can one identify when they are actually exceeding the capacity of their system?

For CPU, monitor the utilization of your cores. If that reaches 100%, you're CPU bound.
More critical is I/O bottlenecks, because your CPU will actively wait until a request is done and can block processes in D state if you have a problem with your storage. Best is here to monitor I/O delay. These two metrics are already included in PVE.

The load value is only a rough estimate and can be falsely high, e.g. with a lot of processes in D state that other processes actively wait for. You also need to monitor the swapin/swapout to have a good memory metric. Swapin is ok, but too much swapin/swapout is bad, then you have memory problems.

In general is Brendan Gregg's website and his book always worth a read.
 
  • Like
Reactions: GTA_doum and fabian
For CPU, monitor the utilization of your cores. If that reaches 100%, you're CPU bound.
More critical is I/O bottlenecks, because your CPU will actively wait until a request is done and can block processes in D state if you have a problem with your storage. Best is here to monitor I/O delay. These two metrics are already included in PVE.

The load value is only a rough estimate and can be falsely high, e.g. with a lot of processes in D state that other processes actively wait for. You also need to monitor the swapin/swapout to have a good memory metric. Swapin is ok, but too much swapin/swapout is bad, then you have memory problems.

In general is Brendan Gregg's website and his book always worth a read.

Can you point out a method for monitoring utilization of cores within Proxmox? If that is not possible, am I just looking for some SNMP OIDs to connect to our network monitoring, or something else entirely?