[SOLVED] 31/03/2024 - Proxmox-ve server update = 100% cpu usage, fans going berserk, services keep failing to start. Time change = possible culprit?

AndrzejL

Member
Nov 8, 2020
27
5
8
Hello all.

I did an upgrade and rebooted my proxmox server today. Since then my CPU is ramped up to 100%, fans are howling and some services are failing to start...

1711846032269.png


Code:
System:
  Host: andrzejl Kernel: 6.5.13-3-pve arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
    Console: pty pts/1 Distro: Debian GNU/Linux 12 (bookworm)
Machine:
  Type: Desktop System: Dell product: OptiPlex 990 v: 01 serial: CKTMF2S Chassis: type: 15
    serial: CKTMF2S
  Mobo: Dell model: 0D6H9T v: A03 serial: /CKTMF2S/CN7220027L01RN/ BIOS: Dell v: A24
    date: 07/02/2018
Battery:
  Message: No system battery data found. Is one present?
Memory:
  RAM: total: 31.22 GiB used: 21.41 GiB (68.6%)
  Array-1: capacity: 32 GiB slots: 4 EC: None max-module-size: 8 GiB note: est.
  Device-1: ChannelA-DIMM0 type: DDR3 detail: synchronous size: 8 GiB speed: 1333 MT/s
    volts: N/A width (bits): data: 64 total: 64 manufacturer: TeamGroup part-no: TIMETEC-UD3-1333
    serial: 0000E3C2
  Device-2: ChannelA-DIMM1 type: DDR3 detail: synchronous size: 8 GiB speed: 1333 MT/s
    volts: N/A width (bits): data: 64 total: 64 manufacturer: Hynix/Hyundai
    part-no: HMT41GU6MFR8C-PB serial: 3040456E
  Device-3: ChannelB-DIMM0 type: DDR3 detail: synchronous size: 8 GiB speed: 1333 MT/s
    volts: N/A width (bits): data: 64 total: 64 manufacturer: TeamGroup part-no: TIMETEC-UD3-1333
    serial: 0000E3B1
  Device-4: ChannelB-DIMM1 type: DDR3 detail: synchronous size: 8 GiB speed: 1333 MT/s
    volts: N/A width (bits): data: 64 total: 64 manufacturer: Hynix/Hyundai
    part-no: HMT41GU6MFR8C-PB serial: 00A15177
CPU:
  Info: quad core model: Intel Core i7-2600 bits: 64 type: MT MCP smt: enabled arch: Sandy Bridge
    rev: 7 cache: L1: 256 KiB L2: 1024 KiB L3: 8 MiB
  Speed (MHz): avg: 3088 high: 3746 min/max: 1600/3800 volts: 0.0 V ext-clock: 100 MHz cores:
    1: 2979 2: 3682 3: 3746 4: 2659 5: 2698 6: 2336 7: 2951 8: 3660 bogomips: 54278
  Flags: acpi aes aperfmperf apic arat arch_perfmon avx bts clflush cmov constant_tsc cpuid
    cx16 cx8 de ds_cpl dtes64 dtherm dts epb ept est flexpriority fpu fxsr ht ibpb ibrs ida
    lahf_lm lm mca mce mmx monitor msr mtrr nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm
    pebs pge pln pni popcnt pse pse36 pti pts rdtscp rep_good sep smx sse sse2 sse4_1 sse4_2
    ssse3 stibp syscall tm tm2 tpr_shadow tsc tsc_deadline_timer vme vmx vnmi vpid x2apic xsave
    xsaveopt xtopology xtpr
Graphics:
  Device-1: Intel 2nd Generation Core Processor Family Integrated Graphics vendor: Dell
    driver: i915 v: kernel arch: Gen-6 ports: active: none empty: DP-1,HDMI-A-1,VGA-1
    bus-ID: 00:02.0 chip-ID: 8086:0102 class-ID: 0300
  Display: server: No display server data found. Headless machine? tty: 120x80
  API: OpenGL Message: GL data unavailable in console for root.
Audio:
  Device-1: Intel 6 Series/C200 Series Family High Definition Audio vendor: Dell 6
    driver: snd_hda_intel v: kernel bus-ID: 00:1b.0 chip-ID: 8086:1c20 class-ID: 0403
  API: ALSA v: k6.5.13-3-pve status: kernel-api

Code:
root@andrzejl:~# pveversion
pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)

Code:
root@andrzejl:~# lsb_release  -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm
root@andrzejl:~#

Packages upgraded today:

Code:
2024-03-30 21:34:32 status half-installed python3-pyvmomi:all 6.7.1-4.1
2024-03-30 21:34:32 status half-installed libpve-access-control:all 8.1.2
2024-03-30 21:34:32 status half-installed libpve-apiclient-perl:all 3.3.1
2024-03-30 21:34:32 status half-installed proxmox-backup-client:amd64 3.1.4-1
2024-03-30 21:34:32 status half-installed proxmox-backup-file-restore:amd64 3.1.4-1
2024-03-30 21:34:32 status half-installed libpve-http-server-perl:all 5.0.5
2024-03-30 21:34:33 status half-installed libpve-storage-perl:all 8.1.0
2024-03-30 21:34:33 status half-installed proxmox-widget-toolkit:all 4.1.4
2024-03-30 21:34:33 status half-installed pve-docs:all 8.1.4
2024-03-30 21:34:33 status half-installed pve-manager:amd64 8.1.5
2024-03-30 21:34:34 status half-installed libpve-network-perl:all 0.9.5
2024-03-30 21:34:34 status half-installed pve-esxi-import-tools:amd64 0.6.0
2024-03-30 21:34:34 status installed proxmox-backup-file-restore:amd64 3.1.5-1
2024-03-30 21:34:34 status installed proxmox-widget-toolkit:all 4.1.5
2024-03-30 21:34:34 status installed libpve-network-perl:all 0.9.6
2024-03-30 21:34:34 status installed pve-docs:all 8.1.5
2024-03-30 21:34:34 status installed python3-pyvmomi:all 6.7.1-4.1
2024-03-30 21:34:34 status installed libpve-apiclient-perl:all 3.3.2
2024-03-30 21:34:34 status installed proxmox-backup-client:amd64 3.1.5-1
2024-03-30 21:34:34 status installed libpve-access-control:all 8.1.3
2024-03-30 21:34:34 status installed libpve-http-server-perl:all 5.0.6
2024-03-30 21:34:34 status installed pve-esxi-import-tools:amd64 0.6.0
2024-03-30 21:34:34 status installed libpve-storage-perl:all 8.1.4
2024-03-30 21:34:41 status installed pve-manager:amd64 8.1.10
2024-03-30 21:34:47 status installed pve-ha-manager:amd64 4.0.3
2024-03-30 21:34:48 status installed man-db:amd64 2.11.2-2

1711848912182.png

Code:
root@andrzejl:~# systemctl list-units --failed
  UNIT                   LOAD   ACTIVE SUB    DESCRIPTION
● atop-rotate.service    loaded failed failed Restart atop daemon to rotate logs
● atop.service           loaded failed failed Atop advanced performance monitor
● dpkg-db-backup.service loaded failed failed Daily dpkg database backup service
● logrotate.service      loaded failed failed Rotate log files


LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
4 loaded units listed.

Code:
journalctl -xeu SERVICENAME.service

Code:
Mar 31 00:59:30 andrzejl systemd[1]: Failed to start logrotate.service - Rotate log files.
░░ Subject: A start job for unit logrotate.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit logrotate.service has finished with a failure.
░░
░░ The job identifier is 143587463 and the job result is failed.
Mar 31 00:59:30 andrzejl systemd[1]: logrotate.service: Start request repeated too quickly.
Mar 31 00:59:30 andrzejl systemd[1]: logrotate.service: Failed with result 'start-limit-hit'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit logrotate.service has entered the 'failed' state with result 'start-limit-hit'.
Mar 31 00:59:30 andrzejl systemd[1]: Failed to start logrotate.service - Rotate log files.
░░ Subject: A start job for unit logrotate.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit logrotate.service has finished with a failure.
░░
░░ The job identifier is 143587635 and the job result is failed.
lines 4925-5003/5003 (END)

Code:
Mar 31 00:59:59 andrzejl systemd[1]: Failed to start dpkg-db-backup.service - Daily dpkg database backup service.
░░ Subject: A start job for unit dpkg-db-backup.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit dpkg-db-backup.service has finished with a failure.
░░
░░ The job identifier is 145500720 and the job result is failed.
Mar 31 00:59:59 andrzejl systemd[1]: Failed to start dpkg-db-backup.service - Daily dpkg database backup service.
░░ Subject: A start job for unit dpkg-db-backup.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit dpkg-db-backup.service has finished with a failure.
░░
░░ The job identifier is 145500892 and the job result is failed.
Mar 31 02:00:00 andrzejl systemd[1]: Failed to start dpkg-db-backup.service - Daily dpkg database backup service.
░░ Subject: A start job for unit dpkg-db-backup.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit dpkg-db-backup.service has finished with a failure.
░░
░░ The job identifier is 145501236 and the job result is failed.
lines 5400-5478/5478 (END)

Code:
Mar 31 00:23:22 andrzejl systemd[1]: Stopped atop.service - Atop advanced performance monitor.
░░ Subject: A stop job for unit atop.service has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit atop.service has finished.
░░
░░ The job identifier is 3140 and the job result is done.
Mar 31 00:23:22 andrzejl systemd[1]: atop.service: Start request repeated too quickly.
Mar 31 00:23:22 andrzejl systemd[1]: atop.service: Failed with result 'signal'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit atop.service has entered the 'failed' state with result 'signal'.
Mar 31 00:23:22 andrzejl systemd[1]: Failed to start atop.service - Atop advanced performance monitor.
░░ Subject: A start job for unit atop.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit atop.service has finished with a failure.
░░
░░ The job identifier is 3140 and the job result is failed.
lines 159-237/237 (END)

Code:
Mar 31 00:59:59 andrzejl systemd[1]: Failed to start atop-rotate.service - Restart atop daemon to rotate logs.
░░ Subject: A start job for unit atop-rotate.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit atop-rotate.service has finished with a failure.
░░
░░ The job identifier is 145500634 and the job result is failed.
Mar 31 00:59:59 andrzejl systemd[1]: Failed to start atop-rotate.service - Restart atop daemon to rotate logs.
░░ Subject: A start job for unit atop-rotate.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit atop-rotate.service has finished with a failure.
░░
░░ The job identifier is 145501064 and the job result is failed.
Mar 31 02:00:00 andrzejl systemd[1]: Failed to start atop-rotate.service - Restart atop daemon to rotate logs.
░░ Subject: A start job for unit atop-rotate.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit atop-rotate.service has finished with a failure.
░░
░░ The job identifier is 145501150 and the job result is failed.
lines 5400-5478/5478 (END)

Code:
root@andrzejl:~# journalctl -b -l -x --no-pager -p 3 > wc.txt
root@andrzejl:~# grep ' Failed to start' ./wc.txt > wc-failedline.txt
root@andrzejl:~# wc -l ./wc-failedline.txt
947587 ./wc-failedline.txt
root@andrzejl:~# du -h ./wc.txt
370M    ./wc.txt
root@andrzejl:~#

Please don't hesitate to ask for more information.

Thanks in advance for all help provided.

Kindest regards.

AndrzejL
 
Last edited:
Update 01: Question... Could time change cause such behaviour??? I am starting to think this was the culprit...

Most European countries will spring forward one hour at 01:00 UTC on March 31, 2024. The local time for the change is different in each time zone.

After about hour CPU usage went back to normal. I have rebooted the server.

After reboot - situation looks like it resolved itself? Normal CPU usage?

1711847862495.png

1711848534645.png

1711848569800.png

Exactly 0 packages have failed to start?

Code:
andrzejl@andrzejl:~$ su -
Password:
root@andrzejl:~# systemctl list-units --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.
root@andrzejl:~#

During all this ordeal network usage was relatively normal.

1711849045003.png

I believe it was a time change that caused such madness... First time I've noticed behaviour like this...

Has something similar happened to You?

Kindest regards.

AndrzejL
 
Last edited:
Sitrep.

Server suddenly decided to howl to a moon in the middle of the night and then just as suddenly stopped on its own.

Cat only knows what caused it.

Kindest regards.

AndrzejL
 
I think that your Server is a bit too old tbh.
An raspberry pi5 might have more power tbh

Edit:
Don't get it wrong, im not saying you should use one, raspberrys are crap, but you should upgrade your server to something a little newer, with ddr4-ecc at least, no matter how cheap.

I think simply that nowadays no matter what the vms's do, will lead to 100% cpu consumption.
 
Last edited:
I think that your Server is a bit too old tbh.
An raspberry pi5 might have more power tbh

Edit:
Don't get it wrong, im not saying you should use one, raspberrys are crap, but you should upgrade your server to something a little newer, with ddr4-ecc at least, no matter how cheap.

I think simply that nowadays no matter what the vms's do, will lead to 100% cpu consumption.

Hi :)

The server is oldish, its been running for few years, but this was a very first time it actually utilised 100% of the cpu for any noticeable period of time. I don't like creating e-waste so until it does not die it will do. Now that the whole ordeal is over, the cpu usage is back to barely noticeable

Have a great day!

Andrzej
 
  • Like
Reactions: wbk and Ramalama

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!