Proxmox VE 8.1.3 abnormal load

Odmin

New Member
Jan 3, 2024
4
0
1
Hi all,
I upgraded a few servers with different configurations from VE 7 to the latest Proxmox VE 8.1.3 and saw the abnormal load on the servers without anything running:
Server1 before and after upgrade:
1704292247250.png
1704292290760.png
It is just a permanent load on the server, looks like a bug.
kernel: Linux pve01 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux
pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.5.11-7-pve)

I'll try to debug it with sysstat:
CPU utilization
sar -u 1 5
Code:
root@pve01:~# sar -u 1 5
Linux 6.5.11-7-pve (pve01)      01/03/2024      _x86_64_        (48 CPU)

04:38:09 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
04:38:10 PM     all      0.00      0.00      0.00      0.00      0.00    100.00
04:38:11 PM     all      0.00      0.00      0.00      0.00      0.00    100.00
04:38:12 PM     all      0.00      0.00      0.00      0.00      0.00    100.00
04:38:13 PM     all      0.02      0.00      0.10      0.00      0.00     99.87
04:38:14 PM     all      0.00      0.00      0.00      0.00      0.00    100.00
Average:        all      0.00      0.00      0.02      0.00      0.00     99.97

Check queue lengths and CPU load averages
sar -q 1 10
Code:
04:39:49 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
04:39:50 PM         0      1085      5.00      4.99      3.94         0
04:39:51 PM         0      1085      5.00      5.00      3.95         0
04:39:52 PM         0      1085      5.00      5.00      3.95         0
04:39:53 PM         0      1085      5.00      5.00      3.95         0
04:39:54 PM         0      1085      5.00      5.00      3.95         0
04:39:55 PM         0      1085      5.00      5.00      3.95         0
04:39:56 PM         0      1085      5.08      5.01      3.96         0
04:39:57 PM         0      1085      5.08      5.01      3.96         0
04:39:58 PM         0      1085      5.08      5.01      3.96         0
04:39:59 PM         0      1079      5.08      5.01      3.96         0
Average:            0      1084      5.03      5.00      3.95         0

Check disk usage
sar -d 1 3
Code:
Average:          DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
Average:      nvme2n1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:      nvme1n1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:      nvme0n1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:      nvme3n1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdp     25.67      0.00    442.67      0.00     17.25      0.00      0.16      0.40
Average:          sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdg      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdc      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sde      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdi      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdh      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdf      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdj      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdk      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdl      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdn      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdm      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdd      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdq     26.00      0.00    442.67      0.00     17.03      0.00      0.10      0.40
Average:          zd0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         zd16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         zd32      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         zd48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         zd64      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         zd80      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         zd96      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         fioa      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd112      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd128      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd144      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd160      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd176      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd192      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd208      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd224      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd240      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd256      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd272      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        zd288      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

Check the memory load
sar -r 1 3
Code:
04:42:36 PM kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
04:42:37 PM 259369616 258317960   3438980      1.30      1052    187988   3868164      1.46   1080416     54092        12
04:42:38 PM 259365584 258313928   3443020      1.30      1052    187996   3868164      1.46   1080792     54096        12
04:42:39 PM 259364148 258312508   3444416      1.30      1052    187996   3868164      1.46   1079952     54092         4
Average:    259366449 258314799   3442139      1.30      1052    187993   3868164      1.46   1080387     54093         9

Check the I/O load
sar -b 1 10
Code:
Linux 6.5.11-7-pve (pve01)      01/03/2024      _x86_64_        (48 CPU)

04:43:56 PM       tps      rtps      wtps      dtps   bread/s   bwrtn/s   bdscd/s
04:43:57 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
04:43:58 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
04:43:59 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
04:44:00 PM     12.00      0.00     12.00      0.00      0.00    320.00      0.00
04:44:01 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
04:44:02 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
04:44:03 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
04:44:04 PM    259.00      0.00    259.00      0.00      0.00  10400.00      0.00
04:44:05 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
04:44:06 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        27.10      0.00     27.10      0.00      0.00   1072.00      0.00

Check swapping activity
sar -W 1 3
Code:
04:45:20 PM  pswpin/s pswpout/s
04:45:21 PM      0.00      0.00
04:45:22 PM      0.00      0.00
04:45:23 PM      0.00      0.00
Average:         0.00      0.00

The next step was collecting trace information via eBFP from the kernel:
bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > trace.data
And make a flame graph, attaching it.
1704369896285.png
Looks like Kernel is doing nothing too.
So, I suspect a bug in the load calculation function.
 

Attachments

Last edited:
Here is another example from Server2 with a different configuration:
1704367634874.png
Kernel:Linux pve02 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux
pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.5.11-7-pve)

Debugging it with sysstat:
CPU utilization
sar -u 1 5
Code:
Linux 6.5.11-7-pve (pve02)      01/04/2024      _x86_64_        (16 CPU)


01:29:06 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
01:29:07 PM     all      0.00      0.00      0.00      0.00      0.00    100.00
01:29:08 PM     all      0.31      0.00      0.25      0.06      0.00     99.38
01:29:09 PM     all      0.13      0.00      0.25      0.00      0.00     99.62
01:29:10 PM     all      0.06      0.00      0.06      0.00      0.00     99.88
01:29:11 PM     all      0.00      0.00      0.06      0.00      0.00     99.94
Average:        all      0.10      0.00      0.12      0.01      0.00     99.76

Check queue lengths and CPU load averages
sar -q 1 10

Code:
Linux 6.5.11-7-pve (pve02)      01/04/2024      _x86_64_        (16 CPU)


01:30:43 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
01:30:44 PM         0       470      2.11      2.08      2.05         0
01:30:45 PM         0       470      2.11      2.08      2.05         0
01:30:46 PM         0       470      2.11      2.08      2.05         0
01:30:47 PM         0       470      2.11      2.08      2.05         0
01:30:48 PM         0       470      2.10      2.08      2.05         0
01:30:49 PM         0       470      2.10      2.08      2.05         0
01:30:50 PM         0       470      2.10      2.08      2.05         0
01:30:51 PM         0       470      2.10      2.08      2.05         0
01:30:52 PM         0       470      2.10      2.08      2.05         0
01:30:53 PM         0       470      2.09      2.08      2.05         0
Average:            0       470      2.10      2.08      2.05         0


Check disk usage
sar -d 1 3
Code:
Average:          DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
Average:      nvme1n1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:      nvme0n1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdc      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdd      3.33    256.00     21.33      0.00     83.20      0.01      1.30      0.93
Average:          sde      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdf      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         dm-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         dm-1      4.67      0.00     21.33      0.00      4.57      0.00      0.86      0.40
Average:         dm-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         dm-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         dm-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         dm-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         dm-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:         dm-8      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

Check the memory load
sar -r 1 3

Code:
Linux 6.5.11-7-pve (pve02)      01/04/2024      _x86_64_        (16 CPU)


01:34:30 PM kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
01:34:31 PM   9644348  10204864  14622696     44.55     98480    626708   3973248      9.80   1027012    645720       248
01:34:32 PM   9644348  10204864  14622696     44.55     98480    626708   3973248      9.80   1027012    645720       272
01:34:33 PM   9644348  10204864  14622696     44.55     98480    626708   3973248      9.80   1027012    645720       272
Average:      9644348  10204864  14622696     44.55     98480    626708   3973248      9.80   1027012    645720       264

Check the I/O load
sar -b 1 10
Code:
Linux 6.5.11-7-pve (pve02)      01/04/2024      _x86_64_        (16 CPU)


01:35:27 PM       tps      rtps      wtps      dtps   bread/s   bwrtn/s   bdscd/s
01:35:28 PM      3.00      3.00      0.00      0.00    768.00      0.00      0.00
01:35:29 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:35:30 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:35:31 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:35:32 PM      4.00      0.00      4.00      0.00      0.00    120.00      0.00
01:35:33 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:35:34 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:35:35 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:35:36 PM      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:35:37 PM      4.00      0.00      4.00      0.00      0.00     96.00      0.00
Average:         1.10      0.30      0.80      0.00     76.80     21.60      0.00

Check swapping activity
sar -W 1 3
Code:
Linux 6.5.11-7-pve (pve02)      01/04/2024      _x86_64_        (16 CPU)


01:36:27 PM  pswpin/s pswpout/s
01:36:28 PM      0.00      0.00
01:36:29 PM      0.00      0.00
01:36:30 PM      0.00      0.00
Average:         0.00      0.00

Collecting trace information via eBFP from the kernel:
bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' > trace.data
1704369973534.png
 

Attachments

Last edited:
Hi Fiona,
Thanks a lot for your reply!
Yes, you are right, I checked on both servers:
Server1 with constant load 5:
Code:
ps aux | grep " [RD]"
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        1254  0.0  0.0      0     0 ?        D<   11:34   0:00 [vdev_autotrim]
root        3317  0.0  0.0      0     0 ?        D<   11:34   0:00 [vdev_autotrim]
root        3318  0.0  0.0      0     0 ?        D<   11:34   0:00 [vdev_autotrim]
root        3319  0.0  0.0      0     0 ?        D<   11:34   0:00 [vdev_autotrim]
root        3320  0.0  0.0      0     0 ?        D<   11:34   0:00 [vdev_autotrim]
root       12447  0.0  0.0  11628  4608 pts/0    R+   11:45   0:00 ps aux

Server2 with constant load 2:
Code:
ps aux | grep " [RD]"
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        1584  0.0  0.0      0     0 ?        D<    2023   0:01 [vdev_autotrim]
root        1585  0.0  0.0      0     0 ?        D<    2023   0:03 [vdev_autotrim]
root     1292548  0.0  0.0  12636  5504 pts/0    R+   10:58   0:00 ps aux

Looks like one adev_autorim process is "eating" 1 point of load.

I turned off autotrim on all pools
zpool set autotrim=off pool_name
But it does not destroy the process "vdev_autotrim" Then I rebooted the server and the issue was gone!
So, the root cause of this issue - ZFS autotrim process.