Understanding CPU Usage and Server Load

uberdome · Jul 29, 2020

I am looking at the graphs that show "CPU usage" and "Server load". It was my understanding that Server Load represents how many things are waiting to be processed. Conceptually, I picture as long as CPU usage is less than 100%, Server Load would be under 1. This is clearly not the case, though.

Can someone explain to me why my CPU Usage would be <50 while my server load is peaking around 3? What does this mean and what, if anything, do I need to do to to improve it?

[I have a one-node Proxmox setup that I am reviewing and attempting to understand. I have been using Proxmox for almost 2 years, but in a limited capacity. I'm reviewing some of my setups and trying to get some possible issues sorted.]

Here is an image of "Week (maximum)" for the node:

fabian · Jul 29, 2020

http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html

uberdome · Jul 29, 2020

fabian said:
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html

Thank you, this is a very helpful resource. I have a better understanding of my previous misconceptions.

I thought Server Load was only related to CPU usage, and was load relative to 1 being maxed out. I now understand that the CPU usage components of Server Load is effectively per core (i.e., if you have 4 cores, the CPU usage component of Server Load would be maxed out at 4, not at 1). I also now understand that the Server Load can be higher than CPU count and still be functioning normally, as it takes into account other metrics.

It is a bit hard to gain information from that Server Load metric. I feel like it does not do a good job of representing actual bottlenecks in the system (i.e., it does not reveal where a problem is, just that load may have increased or decreased or be increasing or decreasing.). I think it would be more useful to see individual system performance metrics to be able to identify bottlenecks. Perhaps I am just inexperienced in this areas, though.

Anyhow, My Server Load maximums are still 25% lower than my core count, so I am more comfortable knowing I am not currently exceeding the capacity of my system. I also know this can go over my core count, but it is unclear how much or when this value represents a problem.

How can one identify when they are actually exceeding the capacity of their system?

LnxBil · Jul 30, 2020

uberdome said:
How can one identify when they are actually exceeding the capacity of their system?

For CPU, monitor the utilization of your cores. If that reaches 100%, you're CPU bound.
More critical is I/O bottlenecks, because your CPU will actively wait until a request is done and can block processes in D state if you have a problem with your storage. Best is here to monitor I/O delay. These two metrics are already included in PVE.

The load value is only a rough estimate and can be falsely high, e.g. with a lot of processes in D state that other processes actively wait for. You also need to monitor the swapin/swapout to have a good memory metric. Swapin is ok, but too much swapin/swapout is bad, then you have memory problems.

In general is Brendan Gregg's website and his book always worth a read.

uberdome · Aug 3, 2020

LnxBil said:
For CPU, monitor the utilization of your cores. If that reaches 100%, you're CPU bound.
More critical is I/O bottlenecks, because your CPU will actively wait until a request is done and can block processes in D state if you have a problem with your storage. Best is here to monitor I/O delay. These two metrics are already included in PVE.

The load value is only a rough estimate and can be falsely high, e.g. with a lot of processes in D state that other processes actively wait for. You also need to monitor the swapin/swapout to have a good memory metric. Swapin is ok, but too much swapin/swapout is bad, then you have memory problems.

In general is Brendan Gregg's website and his book always worth a read.

Can you point out a method for monitoring utilization of cores within Proxmox? If that is not possible, am I just looking for some SNMP OIDs to connect to our network monitoring, or something else entirely?

LnxBil · Aug 4, 2020

uberdome said:
Can you point out a method for monitoring utilization of cores within Proxmox? If that is not possible, am I just looking for some SNMP OIDs to connect to our network monitoring, or something else entirely?

PVE is not different than any other Linux or Unix for that matter.
I'd go with the external metrics server and do the monitoring over the values.

Search

Search

Understanding CPU Usage and Server Load

uberdome

Member

fabian

Proxmox Staff Member

uberdome

Member

LnxBil

Distinguished Member

uberdome

Member

LnxBil

Distinguished Member

We value your privacy