Strange Load Issues

infinityM

Member
Dec 7, 2019
172
1
18
30
Hey Guys,

Can someone help me understand load issues on proxmox... Normally I would check a couple things and find the cause, IO, CPU & Memory...

With proxmox however, the IOTOP reports very low IO, The CPU usage is almost non existent and the Memory is less than 50% used @ 60GB free.
Yet every couple of minutes the load randomly jumps to 30-40 and then drops back down to 3...

All other servers in the cluster run smoothly. What might be causing this? It's super frustrating not being able to find the root cause... P.S, I am very familiar with hosting servers, but still learning with proxmox itself D:

Thanks in advance guys :)
 
Hi,

It is important to find what procces will "eat" your host performance exactly on this bad event occur. For such events I use monit who can catch different metrics like load, cpu, whatever and when it is happening I can run any custom command like top/iops/whatever so I can have a clue what process is out of normal.

Good luck / Bafta !
 
That's the thing, I am monitoring it. But I don't see any strange processes, the CPU, memory and IO are all chilled, but the load still majorly jumps...
I can't figure out what it is...

Any tips on what setup might help me better understand the possible issue
 
Check the open filedescriptor limit on your proxmox hosts for your kvm processes, you might be running into that limit. I had that yesterday and the same weird behavior happened (high load, but no cpu or io on disk or network).
The default limit is 1024, which seems to be way too low for server usage.
 
Hi,

You can check open file descriptor limit with:

sysctl -a | grep fs.file-max fs.file-max = 9223372036854775807

So if you have the same huge value, then the problem is not here!
If you have a low value, you can see in your logs, a mesage like "too many open files"!


Good luck / bafta !
 
fs.file-max is NOT the open files limit per process! fs.file-max is the maximum file descriptors enforced on a kernel level, which cannot be surpassed by all processes without increasing.
See /proc/PID/limit en change PID with the PID of a running kvm instance. It is at 1024 and since this also includes network connections, this is a very low limit. Currently this needs changing in /etc/security/limits.conf (or a file in limits.d file) and in systemd too (/etc/systemd/system.conf or in the proxmox service handling the vm's).
Also, the OS never logs if a process gets into fd-limit trouble, you can only see that with an strace. See also this thread: https://forum.proxmox.com/threads/open-files-issue-on-pve-node.69783
 
  • Like
Reactions: guletz
OK, then you can add in your /etc/security/limits.conf this lines(adjust values for your needs):

Code:
*               soft    nofile           10240
*               hard    nofile           10240
root               soft    nofile           10240
root               hard    nofile           1048576


And as @liedekef said, you also need to edit and change the desired values, as this example(see DefaultLimitNOFILE=10240:524288)

Code:
cat /etc/systemd/system.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See systemd-system.conf(5) for details.

[Manager]
#LogLevel=info
#LogTarget=journal-or-kmsg
#LogColor=yes
#LogLocation=no
#DumpCore=yes
#ShowStatus=yes
#CrashChangeVT=no
#CrashShell=no
#CrashReboot=no
#CtrlAltDelBurstAction=reboot-force
#CPUAffinity=1 2
#RuntimeWatchdogSec=0
#ShutdownWatchdogSec=10min
#WatchdogDevice=
#CapabilityBoundingSet=
#NoNewPrivileges=no
#SystemCallArchitectures=
#TimerSlackNSec=
#DefaultTimerAccuracySec=1min
#DefaultStandardOutput=journal
#DefaultStandardError=inherit
#DefaultTimeoutStartSec=90s
#DefaultTimeoutStopSec=90s
#DefaultRestartSec=100ms
#DefaultStartLimitIntervalSec=10s
#DefaultStartLimitBurst=5
#DefaultEnvironment=
#DefaultCPUAccounting=no
#DefaultIOAccounting=no
#DefaultIPAccounting=no
#DefaultBlockIOAccounting=no
#DefaultMemoryAccounting=yes
#DefaultTasksAccounting=yes
#DefaultTasksMax=
#DefaultLimitCPU=
#DefaultLimitFSIZE=
#DefaultLimitDATA=
#DefaultLimitSTACK=
#DefaultLimitCORE=
#DefaultLimitRSS=
DefaultLimitNOFILE=10240:524288
#DefaultLimitAS=
#DefaultLimitNPROC=
#DefaultLimitMEMLOCK=
#DefaultLimitLOCKS=
#DefaultLimitSIGPENDING=
#DefaultLimitMSGQUEUE=
#DefaultLimitNICE=
#DefaultLimitRTPRIO=
#DefaultLimitRTTIME=



Good luck / Bafta !
 
Last edited:
  • Like
Reactions: liedekef
Again: not sufficient for services launched/managed by systemd. And while it is nice to post this info, in the other thread this is already discussed in length.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!