High load, low CPU usage (2 servers)

DavidDV

Active Member
Jul 31, 2019
7
0
41
28
Hi, I have a pretty curious problem.

I have a dedicated server in hetzner (EX42-NVME) with i7-6700, 64GB DDR4 and NVME 512 RAID-1 ZFS (by proxmox installer).

Proxmox 6.0.7 running 6 LXC instances with extremely low usage (5-10% cpu max.), CPU usage is very low throughout the dedicated, but the load is always in 3 - 4, I think it is too high considering that the CPU usage is minimal.

- CPU %, minimal.
- Network, monitoring with nload network use in both proxmox and all containers is very low or non-existent.
- Disk, monitoring with iotop the disk is almost not used, little movement.

The funny thing is that it also happens to me in my homelab!

Intel Pentium Gold G5400 + 8GB DDR4 + RAID-1 WD Green 1TB. without any container in operation, also with proxmox 6.0.7 the load is 1.5 - 2.0 minimum, with 0% usage in all day. What could be the problem? Is it a proxmox 6 bug?

Thank you!
 
Hi,

root@nodo:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 476G 74.2G 402G - - 23% 15% 1.00x ONLINE -

root@nodo:~# zpool status -v
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:01:27 with 0 errors on Sun Sep 8 00:25:28 2019
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0
nvme1n1p3 ONLINE 0 0 0

errors: No known data errors
 
I think I found the problem:

Every night a backup is made to an external server via CIFS configured in proxmox. Long ago, communication was cut promptly with the CIFS server and since then I think it fails.

The IO load is very high:

ioload.jpg

The backup process has run in the background:

root@nodo:~# ps faxl | grep " D "
1 0 20357 2 20 0 0 0 msleep D ? 1:12 \_ [cifsd]
0 0 12029 11744 20 0 6072 896 pipe_w S+ pts/8 0:00 | \_ grep D
0 0 30390 1 20 0 0 0 io_sch D ? 16:36 [gzip]

I can't stop it and it is generating a lot of load that is spoiling the disks:

nvme.jpg

The degradation has risen 5-8% in 1 month :(
 
I can't stop it and it is generating a lot of load that is spoiling the disks:

It can't generate anything, because it is locked in D state, which is an uninterruptible sleep. Samba lockups are always bad and can normally be resolved by getting a working "backend server" or trying to forcefully unmount the mountpoint. If df also blocking if you run it? What about entries in dmesg?