systemd 100% cpu hang?

rechena · Mar 25, 2023

Ok I've been at this for the past several hours, I've tried everything and at this stage I don't know what to do next.
This morning I woke up with my NUC fans spinning quite high. When I tried to connect to ssh or shell I had no joy. So I tried a reboot. To my surprise my proxmox host was never to work again.

So since then I've tried everything I can think of and I'm at a lost.

I've tried a new M.2 on my NUC, still the same result.
I'be tried my old desktop where I was running before with SSDs. still nothing...

I've tried to reinstall from scratch since I might be able to restore the vms from backups.

The symptom is always the same..

Host boots and after quite some time trying to log me in on the console I get in and doing a top the top process is systemd at 100% cpu. The current version I"m running is: 7.4 which I just did a fresh install.

This is my pveversion

Code:

-bash-5.1# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
-bash-5.1#

Thanks for the help... Happy to try and provide more info... I did looked trough the forums and couldn't find anything...

rechena · Mar 25, 2023

Any time I try to run a systemctl command also fails...

:/var/log# systemctl status Failed to read server status: Transport endpoint is not connected

rechena · Mar 25, 2023

Just reinstalled with proxmox 6.4 and everything seems back to be working... So I'm wondering if this is related to the kernel 5.15?

chuffy · Mar 25, 2023

Hey there,
Getting the same issue after the system was booted after a power outage.

Wonder if I can downgrade the Kernel?

Very strange!

Bash:

proxmox-ve: 7.2-1 (running kernel: 5.15.30-2-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-helper: 7.2-2
pve-kernel-5.15: 7.2-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-1
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Bash:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1 root      20   0  164244  10404   7556 R 100.0   0.0   5:55.93 systemd

pveuser · Mar 25, 2023

gwojcieszczuk · Mar 25, 2023

This can't be coincidence. I'm having exactly the same issue. Problem started (I think) yesterday evening/night.

EnF · Mar 25, 2023

Same here, I have 2 PVE hosts affected by the same issue.

markomo · Mar 25, 2023

gwojcieszczuk said:
This can't be coincidence. I'm having exactly the same issue. Problem started (I think) yesterday evening/night.

I would say so. My system rebooted last time 3 days ago and was working fine until today when I've found that one CT is offline and could not bring it back up.

gwojcieszczuk · Mar 25, 2023

I just tried changing timezone on my system to Etc, and problem went away.

Code:

ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.

chuffy · Mar 25, 2023

Ah interesting. I'll give that a go and report back

markomo · Mar 25, 2023

gwojcieszczuk said:
I just tried changing timezone on my system to Etc, and problem went away.

Code:

ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.

Must be a short distance between us (Wicklow) then lol. It would be very strange fix. Worth trying..

EnF · Mar 25, 2023

Changing the timezone works for me

chuffy · Mar 25, 2023

That sorted it. We've working VMs up in County Mayo also! @gwojcieszczuk You're the hero we deserve.

gwojcieszczuk · Mar 25, 2023

Regards from Greystones

chuffy · Mar 25, 2023

Wonder if it was related to the just the Dublin timezone then, nothing in from our European friends?

markomo · Mar 25, 2023

gwojcieszczuk said:
I just tried changing timezone on my system to Etc, and problem went away.

Code:

ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.

Oh my god! This actually worked!
Now, Proxmox team take this any stick it on the top of the support forum for everyone to see

Thanks man, you saved the day

gwojcieszczuk said:
Regards from Greystones

Wicklow server farm

Hahha unbelievable

gbeckman · Mar 25, 2023

gwojcieszczuk said:
I just tried changing timezone on my system to Etc, and problem went away.

Code:

ln -sf /usr/share/zoneinfo/Etc /etc/localtime

I had this system operating in Europe/Dublin timezone. Maybe they issue has something to do with upcoming daylight saving, etc.

After changing the timezone, I rebooted the machine. After reboot, all works fine.

Hello from Dublin/Blanch

OMG!!! You saved my life

Just one quick question, how did you figure it out ?

THANKS!

gwojcieszczuk · Mar 25, 2023

For anyone wondering how I came up with this solution ...
I've noticed systemd (PID=1) is running at 100% CPU utilization. That's not normal.
I ran this command:

Bash:

strace -f -p 1

The process was reporting continuously attempt to access /etc/localtime file. That's also unusual.

EDIT:
I managed to get the output I was receiving on the console while tracking systemd process.

Code:

stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0

rechena · Mar 25, 2023

gbeckman said:
Hello from Dublin/Blanch

OMG!!! You saved my life

Just one quick question, how did you figure it out ?

THANKS!

Ah.. Dublin Blanch here also

This would have been handy several hours ago

markomo · Mar 25, 2023

gwojcieszczuk said:
Regards from Greystones

Well you are close enough, let's grab a Guiness pint in Bray

systemd 100% cpu hang?

Member

Member

Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

Member

New Member

We value your privacy