Load to high every Night at 2:00

david83 · Jan 3, 2022

Hi, everyone,
For some time now I have the problem that I get a message every night at 2:00 a.m. (I use Zabbix) that the load average (5min) is too high.
I just can't find out why this is and I would like to eliminate and understand this problem.

Is the situation:

- I have 4 Ct (Lxc) containers running.
- All containers use a Debian template 10 or 11
- Each container and the host itself have a load average value of> 4 -7 (5min) every night at 2:00
- The normal load average is max 2

Hardware:
Dell E7270 i5-6300U 2.4Ghz 16 GB RAM SSD hard drive

So far I've tried to find a cron job on all ct containers and the host that runs at 2:00.
I also set the VZDump backups to a different time.
Unfortunately, I just can't get any further and would be grateful for a tip.

Kind regards
David

Here are a few screenshots, the first two are from the Proxmox host and the other two from an m ct (lxc)

Hannes Laimer · Jan 4, 2022

Hey,

could you check the logs during that time, something like journalctl --since "2022-01-<DAY> 01:59:50" --until "2022-01-<DAY> 02:04:00" should work. Also, a bit of a wild guess, but do you have a static IP? If not, you're ISP might assign you a new IP at 2 AM, and if your containers rely on internet access they might start polling or something.

david83 · Jan 4, 2022

@Hannes Laimer

Hello, I've checked the logs for the past 2 weeks. Unfortunately, the command doesn't work like this: @pve: ~ # journalctl --since "2022-01- <DAY> 01:59:50" --until "2022-01- <DAY> 02:04:00" Failed to parse timestamp: 2022-01- <DAY> 01:59:50

That's why I checked everything individually. But it's always the same mistake: pve kernel:
EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup

Code:

@pve: ~ # journalctl --since "2022-01-03 01:59:50" --until "2022-01-03 02:04:00" - Journal begins at Thu 2021-09-16 08:07:09 CEST, ends at Tue 2022-01-04 10:51:40 CET. - Jan 03 02:00:00 pve pvescheduler [1238616]: <root @ pam> starting task UPID: pve: 0012E659: 10AD20DE: 61D24A90: vzdump: 100: root @ pam: Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: starting new backup job: vzdump 100 --storage pbs --mode snapshot --mailto david@xxx.de --node pve> Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: Starting Backup of VM 100 (lxc) lines 1-4 ... skipping ... - Journal begins at Thu 2021-09-16 08:07:09 CEST, ends at Tue 2022-01-04 10:51:40 CET. - Jan 03 02:00:00 pve pvescheduler [1238616]: <root @ pam> starting task UPID: pve: 0012E659: 10AD20DE: 61D24A90: vzdump: 100: root @ pam: Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: starting new backup job: vzdump 100 --storage pbs --mode snapshot --mailto david@xxx.de --node pve> Jan 03 02:00:01 pve pvescheduler [1238617]: INFO: Starting Backup of VM 100 (lxc) Jan 03 02:00:01 pve dmeventd [380]: No longer monitoring thin pool pve-data-tpool. Jan 03 02:00:01 pve lvm [380]: Monitoring thin pool pve-data-tpool. Jan 03 02:00:01 pve kernel: EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup Jan 03 02:00:01 pve kernel: EXT4-fs (dm-11): mounted filesystem without journal. Opts: noload. Quota mode: none. Jan 03 02:01:35 pve pvestatd [846]: status update time (6.826 seconds) Jan 03 02:01:53 pve pvestatd [846]: status update time (5.799 seconds) Jan 03 02:02:22 pve pvescheduler [1239834]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout Jan 03 02:02:22 pve pve-firewall [847]: firewall update time (17.107 seconds) Jan 03 02:02:23 pve pvestatd [846]: status update time (24.211 seconds) Jan 03 02:03:00 pve pvestatd [846]: status update time (7.210 seconds)

Hannes Laimer · Jan 4, 2022

Does the spike also happen if no backups are done at 02:00? Other than the backup nothing happened according to the logs, so it seems reasonable to assume that the backup is the reason. For the EXT4-fs (dm-11): write access unavailable, skipping orphan cleanup take a look at[1], this is normal.
The <DAY> was supposed to be changed to the day it happened... should have probably mentioned that

[1] https://forum.proxmox.com/threads/e...ss-unavailable-skipping-orphan-cleanup.46785/

david83 · Jan 4, 2022

hi, so i had all backups made at 2:00. But because I thought that there might be a connection with the spike and the backups, I redistributed the 4 ct backups to other times. a ct at 2:00 a ct at 3:00 a ct at 4:00 a ct at 5:00.
but only at 2:00 there is a high spike.

david83 · Jan 4, 2022

i think I will not save the ct at 2:00 tonight. and see if it looks different then.

Hannes Laimer · Jan 4, 2022

Does the spike happen if you start the backup manually for the CT that would usually be backup up at 2:00?

david83 · Jan 4, 2022

I started the backup of the ct grade manually. yes the spike is there too. If I think about it now, it is possible that the backup itself is not a problem but uploading to the pbs server or encrypting the backup before it is uploaded.

david83 · Jan 4, 2022

ok if I do the backup local without encryption the value is significantly lower. think so the error is found. Many Thanks!

Backup starts at 12:21

Load to high every Night at 2:00

david83

Member

Attachments

Hannes Laimer

Proxmox Staff Member

david83

Member

Hannes Laimer

Proxmox Staff Member

david83

Member

david83

Member

Hannes Laimer

Proxmox Staff Member

david83

Member

david83

Member

We value your privacy