systemd 100% cpu hang?

rechena

Member
Apr 27, 2020
18
5
23
45
Ok I've been at this for the past several hours, I've tried everything and at this stage I don't know what to do next.
This morning I woke up with my NUC fans spinning quite high. When I tried to connect to ssh or shell I had no joy. So I tried a reboot. To my surprise my proxmox host was never to work again.

So since then I've tried everything I can think of and I'm at a lost.

I've tried a new M.2 on my NUC, still the same result.
I'be tried my old desktop where I was running before with SSDs. still nothing...

I've tried to reinstall from scratch since I might be able to restore the vms from backups.

The symptom is always the same..

Host boots and after quite some time trying to log me in on the console I get in and doing a top the top process is systemd at 100% cpu. The current version I"m running is: 7.4 which I just did a fresh install.

This is my pveversion

Code:
-bash-5.1# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
-bash-5.1#


Thanks for the help... Happy to try and provide more info... I did looked trough the forums and couldn't find anything...
 
Last edited:
  • Like
Reactions: Pakillo77
Any time I try to run a systemctl command also fails...


:/var/log# systemctl status Failed to read server status: Transport endpoint is not connected
 
Just reinstalled with proxmox 6.4 and everything seems back to be working... So I'm wondering if this is related to the kernel 5.15?
 
Hey there,
Getting the same issue after the system was booted after a power outage.

Wonder if I can downgrade the Kernel?

Very strange!

Bash:
proxmox-ve: 7.2-1 (running kernel: 5.15.30-2-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-helper: 7.2-2
pve-kernel-5.15: 7.2-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-1
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Bash:
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1 root      20   0  164244  10404   7556 R 100.0   0.0   5:55.93 systemd
 
This can't be coincidence. I'm having exactly the same issue. Problem started (I think) yesterday evening/night.
 
This can't be coincidence. I'm having exactly the same issue. Problem started (I think) yesterday evening/night.
I would say so. My system rebooted last time 3 days ago and was working fine until today when I've found that one CT is offline and could not bring it back up.
 
I just tried changing timezone on my system to Etc, and problem went away.

Code:
ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.
 
I just tried changing timezone on my system to Etc, and problem went away.

Code:
ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.
Must be a short distance between us (Wicklow) then lol. It would be very strange fix. Worth trying..
 
Wonder if it was related to the just the Dublin timezone then, nothing in from our European friends?
 
I just tried changing timezone on my system to Etc, and problem went away.

Code:
ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.

Oh my god! This actually worked!
Now, Proxmox team take this any stick it on the top of the support forum for everyone to see :)
Thanks man, you saved the day
Regards from Greystones ;)
Wicklow server farm :D Hahha unbelievable
 
  • Like
Reactions: chuffy
I just tried changing timezone on my system to Etc, and problem went away.

Code:
ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.

Hello from Dublin/Blanch :)

OMG!!! You saved my life :)

Just one quick question, how did you figure it out ?

THANKS!
 
For anyone wondering how I came up with this solution ...
I've noticed systemd (PID=1) is running at 100% CPU utilization. That's not normal.
I ran this command:
Bash:
strace -f -p 1
The process was reporting continuously attempt to access /etc/localtime file. That's also unusual.

EDIT:
I managed to get the output I was receiving on the console while tracking systemd process.

Code:
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!