Load to high every Night at 2:00

david83

Member
May 24, 2020
21
0
21
45
Hi, everyone,
For some time now I have the problem that I get a message every night at 2:00 a.m. (I use Zabbix) that the load average (5min) is too high.
I just can't find out why this is and I would like to eliminate and understand this problem.

Is the situation:

- I have 4 Ct (Lxc) containers running.
- All containers use a Debian template 10 or 11
- Each container and the host itself have a load average value of> 4 -7 (5min) every night at 2:00
- The normal load average is max 2

Hardware:
Dell E7270 i5-6300U 2.4Ghz 16 GB RAM SSD hard drive

So far I've tried to find a cron job on all ct containers and the host that runs at 2:00.
I also set the VZDump backups to a different time.
Unfortunately, I just can't get any further and would be grateful for a tip.

Kind regards
David


Here are a few screenshots, the first two are from the Proxmox host and the other two from an m ct (lxc)

proxmoxloadaverage.pngpvelaptop iowait.pngiobroker load average.pngiobroker iowait.png
 

Attachments

  • 32609-ec414c08110a1f6af1310e4f1cf2d0d2.png
    32609-ec414c08110a1f6af1310e4f1cf2d0d2.png
    30.3 KB · Views: 2
  • 32610-7fffa84811958990041a2f2baf1ca789.png
    32610-7fffa84811958990041a2f2baf1ca789.png
    34.5 KB · Views: 2
  • 32611-9a63e4ebf9419e102a03554765ca07e8.png
    32611-9a63e4ebf9419e102a03554765ca07e8.png
    30.4 KB · Views: 5
Last edited:
Hey,

could you check the logs during that time, something like journalctl --since "2022-01-<DAY> 01:59:50" --until "2022-01-<DAY> 02:04:00" should work. Also, a bit of a wild guess, but do you have a static IP? If not, you're ISP might assign you a new IP at 2 AM, and if your containers rely on internet access they might start polling or something.
 
@Hannes Laimer

Hello, I've checked the logs for the past 2 weeks. Unfortunately, the command doesn't work like this: @pve: ~ # journalctl --since "2022-01- <DAY> 01:59:50" --until "2022-01- <DAY> 02:04:00" Failed to parse timestamp: 2022-01- <DAY> 01:59:50

That's why I checked everything individually. But it's always the same mistake: pve kernel:
EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup

Code:
@pve: ~ # journalctl --since "2022-01-03 01:59:50" --until "2022-01-03 02:04:00" - Journal begins at Thu 2021-09-16 08:07:09 CEST, ends at Tue 2022-01-04 10:51:40 CET. - Jan 03 02:00:00 pve pvescheduler [1238616]: <root @ pam> starting task UPID: pve: 0012E659: 10AD20DE: 61D24A90: vzdump: 100: root @ pam: Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: starting new backup job: vzdump 100 --storage pbs --mode snapshot --mailto david@xxx.de --node pve> Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: Starting Backup of VM 100 (lxc) lines 1-4 ... skipping ... - Journal begins at Thu 2021-09-16 08:07:09 CEST, ends at Tue 2022-01-04 10:51:40 CET. - Jan 03 02:00:00 pve pvescheduler [1238616]: <root @ pam> starting task UPID: pve: 0012E659: 10AD20DE: 61D24A90: vzdump: 100: root @ pam: Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: starting new backup job: vzdump 100 --storage pbs --mode snapshot --mailto david@xxx.de --node pve> Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: Starting Backup of VM 100 (lxc) Jan 03 02:00:01 pve dmeventd [380]: No longer monitoring thin pool pve-data-tpool. Jan 03 02:00:01 pve lvm [380]: Monitoring thin pool pve-data-tpool. Jan 03 02:00:01 pve kernel: EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup Jan 03 02:00:01 pve kernel: EXT4-fs (dm-11): mounted filesystem without journal. Opts: noload. Quota mode: none. Jan 03 02:01:35 pve pvestatd [846]: status update time (6.826 seconds) Jan 03 02:01:53 pve pvestatd [846]: status update time (5.799 seconds) Jan 03 02:02:22 pve pvescheduler [1239834]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout Jan 03 02:02:22 pve pve-firewall [847]: firewall update time (17.107 seconds) Jan 03 02:02:23 pve pvestatd [846]: status update time (24.211 seconds) Jan 03 02:03:00 pve pvestatd [846]: status update time (7.210 seconds)
 
Does the spike also happen if no backups are done at 02:00? Other than the backup nothing happened according to the logs, so it seems reasonable to assume that the backup is the reason. For the EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup take a look at[1], this is normal.
The <DAY> was supposed to be changed to the day it happened... should have probably mentioned that :D

[1] https://forum.proxmox.com/threads/e...ss-unavailable-skipping-orphan-cleanup.46785/
 
hi, so i had all backups made at 2:00. But because I thought that there might be a connection with the spike and the backups, I redistributed the 4 ct backups to other times. a ct at 2:00 a ct at 3:00 a ct at 4:00 a ct at 5:00.
but only at 2:00 there is a high spike.
 
Does the spike happen if you start the backup manually for the CT that would usually be backup up at 2:00?
 
I started the backup of the ct grade manually. yes the spike is there too. If I think about it now, it is possible that the backup itself is not a problem but uploading to the pbs server or encrypting the backup before it is uploaded.

chart.png
 
ok if I do the backup local without encryption the value is significantly lower. think so the error is found. Many Thanks!

Backup starts at 12:21

chart (1).png
 
Last edited: