Load Average increasing

vtmikel · Jul 21, 2022

Hi-

I know that this is not per-se a proxmox question. I'm seeing a strange pattern with the reported load average of my server. I don't have my resources over committed, and you can see that the increase in load average does not correspond to my CPU usage. When I leave this at this increasing rate, I've seen load average exceed 250, but the server is running fine.

Nothing abnormal with reviewing output of dmesg -T. When I reboot, the problem goes away for a few days, then reappears. I tried shutting down my VM's and LXC's one-by-one and seeing if the load average starts to bend downward, no luck. Only a full reboot fixes.

I keep everything up to date, so I'm running the latest packages (no subscription).

Anyone have any suggestions on where to look?

Dunuin · Jul 21, 2022

I guess you already ran htop on the host to check the processes?

vtmikel · Jul 21, 2022

I do not see anything out of the ordinary. My most resource intensive VM is the top user of CPU time, but it matches the CPU usage chart I shared. I have 16 cores and they are hardly being utilized.

Odd question - is there any chance the calculation for the server load is incorrect or erroneous? That’s how it feels to me, as the load times I’m seeing does not correspond with the performance.

aaron · Jul 21, 2022

What is the output of uptime?

Any other processes? Maybe post the output of ps auxwf? It is probably best if you redirect the output to a file and upload that.

B.Otto · Jul 21, 2022

vtmikel said:
Odd question - is there any chance the calculation for the server load is incorrect or erroneous? That’s how it feels to me, as the load times I’m seeing does not correspond with the performance.

The GUI should just show the same numbers that you get through 'top', 'uptime' or similar (you gotta check that first).

On the topic of Load Averages, a few days ago someone posted a similar question and mira replied with an lik to an interesting read:
https://forum.proxmox.com/threads/demystifying-load-averages.112303/#post-485014

The 'Load average' number also counts tasks that are in the TASK_UNINTERRUPTIBLE state, which usually happens when they are waiting for I/O. So the number is more a 'System load average' than a 'CPU load average'. Also you might be able to analyze the output of your top command:

You may have seen this state before: it shows up as the "D" state in the output ps and top. The ps(1) man page calls it "uninterruptible sleep (usually IO)".

vtmikel · Jul 22, 2022

Here is the output:

Bash:

root@pve:~# uptime
 23:16:57 up 9 days, 41 min,  1 user,  load average: 56.12, 56.75, 57.16

As you can see, load average has doubled since yesterday.

processes output is attached. Any thought on the number of cifs processes running?

aaron · Jul 22, 2022

vtmikel said:
Any thought on the number of cifs processes running?

Yeah, questions.

What is this? Is there some additional packages for that installed on the node itself, or are there some containers where it is running in?

Quite a lot of them are in D state, waiting for IO to finish. Are they stuck in that state?

LnxBil · Jul 22, 2022

aaron said:
Quite a lot of them are in D state, waiting for IO to finish. Are they stuck in that state?

That will increase the load average, but it will not increase the cpu utilization. I worked in the past on system with other 100 load, that were very snappy with also a lot of processes in D state. Why are there so many cifsd processes? That looks strange to me. What do you do on the system with cifs?

vtmikel · Jul 22, 2022

No added packages on my node except a package for my old UPS.

CIFS mount is used for the Proxmox backup storage location. Additionally a few of my containers use a SMB/CIFS mount within. I changed my NAS from a Synology to a Unraid server (physical), and the timing corresponds to when I saw this problem emerge. I also used that upgrade to switch from NFS to SMB.

Any idea why the processes would behave like that? The NAS Server is up and running fine with no interruptions.

LnxBil · Jul 22, 2022

Could you run mount or do a cat /proc/mounts and paste the output in CODE tags or attach as a file?

vtmikel · Jul 22, 2022

Here you go:

Code:

root@pve:~# cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,relatime 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=32816232k,nr_inodes=8204058,mode=755,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=6570084k,mode=755,inode64 0 0
rpool/ROOT/pve-1 / zfs rw,relatime,xattr,noacl 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=62727 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
sunrpc /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
rpool /rpool zfs rw,noatime,xattr,noacl 0 0
rpool/ROOT /rpool/ROOT zfs rw,noatime,xattr,noacl 0 0
rpool/data /rpool/data zfs rw,noatime,xattr,noacl 0 0
rpool/data/subvol-106-disk-0 /rpool/data/subvol-106-disk-0 zfs rw,noatime,xattr,posixacl 0 0
rpool/data/subvol-102-disk-0 /rpool/data/subvol-102-disk-0 zfs rw,noatime,xattr,posixacl 0 0
rpool/data/subvol-100-disk-0 /rpool/data/subvol-100-disk-0 zfs rw,noatime,xattr,posixacl 0 0
data /data zfs rw,xattr,noacl 0 0
data/subvol-109-disk-0 /data/subvol-109-disk-0 zfs rw,xattr,posixacl 0 0
data/subvol-109-disk-2 /data/subvol-109-disk-2 zfs rw,xattr,posixacl 0 0
data/subvol-109-disk-1 /data/subvol-109-disk-1 zfs rw,xattr,posixacl 0 0
lxcfs /var/lib/lxcfs fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
//10.66.3.42/backups /mnt/pve/Unraid_Backup cifs rw,relatime,vers=3.1.1,cache=strict,username=XXXX,uid=0,noforceuid,gid=0,noforcegid,addr=10.66.3.42,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1 0 0
/dev/fuse /etc/pve fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
tmpfs /run/user/0 tmpfs rw,nosuid,nodev,relatime,size=6570080k,nr_inodes=1642520,mode=700,inode64 0 0

LnxBil · Jul 23, 2022

Okay, that output is not as big as I would have hoped.

Can you try to umount the cifs and remount if needed? I still think that there is something wrong with the amount of kernel threads cifsd. Can you also correlate a high load average with a huge number of cifsd kernel processes? Are they both increasing?

Search

Search

Load Average increasing

vtmikel

New Member

Dunuin

Distinguished Member

vtmikel

New Member

aaron

Proxmox Staff Member

B.Otto

Active Member

vtmikel

New Member

Attachments

aaron

Proxmox Staff Member

LnxBil

Distinguished Member

vtmikel

New Member

LnxBil

Distinguished Member

vtmikel

New Member

LnxBil

Distinguished Member