Load to high every Night at 2:00

david83

Member
May 24, 2020
21
0
21
44
Hi, everyone,
For some time now I have the problem that I get a message every night at 2:00 a.m. (I use Zabbix) that the load average (5min) is too high.
I just can't find out why this is and I would like to eliminate and understand this problem.

Is the situation:

- I have 4 Ct (Lxc) containers running.
- All containers use a Debian template 10 or 11
- Each container and the host itself have a load average value of> 4 -7 (5min) every night at 2:00
- The normal load average is max 2

Hardware:
Dell E7270 i5-6300U 2.4Ghz 16 GB RAM SSD hard drive

So far I've tried to find a cron job on all ct containers and the host that runs at 2:00.
I also set the VZDump backups to a different time.
Unfortunately, I just can't get any further and would be grateful for a tip.

Kind regards
David


Here are a few screenshots, the first two are from the Proxmox host and the other two from an m ct (lxc)

proxmoxloadaverage.pngpvelaptop iowait.pngiobroker load average.pngiobroker iowait.png
 

Attachments

  • 32609-ec414c08110a1f6af1310e4f1cf2d0d2.png
    32609-ec414c08110a1f6af1310e4f1cf2d0d2.png
    30.3 KB · Views: 2
  • 32610-7fffa84811958990041a2f2baf1ca789.png
    32610-7fffa84811958990041a2f2baf1ca789.png
    34.5 KB · Views: 2
  • 32611-9a63e4ebf9419e102a03554765ca07e8.png
    32611-9a63e4ebf9419e102a03554765ca07e8.png
    30.4 KB · Views: 5
Last edited:
Hey,

could you check the logs during that time, something like journalctl --since "2022-01-<DAY> 01:59:50" --until "2022-01-<DAY> 02:04:00" should work. Also, a bit of a wild guess, but do you have a static IP? If not, you're ISP might assign you a new IP at 2 AM, and if your containers rely on internet access they might start polling or something.
 
@Hannes Laimer

Hello, I've checked the logs for the past 2 weeks. Unfortunately, the command doesn't work like this: @pve: ~ # journalctl --since "2022-01- <DAY> 01:59:50" --until "2022-01- <DAY> 02:04:00" Failed to parse timestamp: 2022-01- <DAY> 01:59:50

That's why I checked everything individually. But it's always the same mistake: pve kernel:
EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup

Code:
@pve: ~ # journalctl --since "2022-01-03 01:59:50" --until "2022-01-03 02:04:00" - Journal begins at Thu 2021-09-16 08:07:09 CEST, ends at Tue 2022-01-04 10:51:40 CET. - Jan 03 02:00:00 pve pvescheduler [1238616]: <root @ pam> starting task UPID: pve: 0012E659: 10AD20DE: 61D24A90: vzdump: 100: root @ pam: Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: starting new backup job: vzdump 100 --storage pbs --mode snapshot --mailto david@xxx.de --node pve> Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: Starting Backup of VM 100 (lxc) lines 1-4 ... skipping ... - Journal begins at Thu 2021-09-16 08:07:09 CEST, ends at Tue 2022-01-04 10:51:40 CET. - Jan 03 02:00:00 pve pvescheduler [1238616]: <root @ pam> starting task UPID: pve: 0012E659: 10AD20DE: 61D24A90: vzdump: 100: root @ pam: Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: starting new backup job: vzdump 100 --storage pbs --mode snapshot --mailto david@xxx.de --node pve> Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: Starting Backup of VM 100 (lxc) Jan 03 02:00:01 pve dmeventd [380]: No longer monitoring thin pool pve-data-tpool. Jan 03 02:00:01 pve lvm [380]: Monitoring thin pool pve-data-tpool. Jan 03 02:00:01 pve kernel: EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup Jan 03 02:00:01 pve kernel: EXT4-fs (dm-11): mounted filesystem without journal. Opts: noload. Quota mode: none. Jan 03 02:01:35 pve pvestatd [846]: status update time (6.826 seconds) Jan 03 02:01:53 pve pvestatd [846]: status update time (5.799 seconds) Jan 03 02:02:22 pve pvescheduler [1239834]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout Jan 03 02:02:22 pve pve-firewall [847]: firewall update time (17.107 seconds) Jan 03 02:02:23 pve pvestatd [846]: status update time (24.211 seconds) Jan 03 02:03:00 pve pvestatd [846]: status update time (7.210 seconds)
 
Does the spike also happen if no backups are done at 02:00? Other than the backup nothing happened according to the logs, so it seems reasonable to assume that the backup is the reason. For the EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup take a look at[1], this is normal.
The <DAY> was supposed to be changed to the day it happened... should have probably mentioned that :D

[1] https://forum.proxmox.com/threads/e...ss-unavailable-skipping-orphan-cleanup.46785/
 
hi, so i had all backups made at 2:00. But because I thought that there might be a connection with the spike and the backups, I redistributed the 4 ct backups to other times. a ct at 2:00 a ct at 3:00 a ct at 4:00 a ct at 5:00.
but only at 2:00 there is a high spike.
 
i think I will not save the ct at 2:00 tonight. and see if it looks different then.
 
Does the spike happen if you start the backup manually for the CT that would usually be backup up at 2:00?
 
I started the backup of the ct grade manually. yes the spike is there too. If I think about it now, it is possible that the backup itself is not a problem but uploading to the pbs server or encrypting the backup before it is uploaded.

chart.png
 
ok if I do the backup local without encryption the value is significantly lower. think so the error is found. Many Thanks!

Backup starts at 12:21

chart (1).png
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!