systemd 100% CPU and zombie processes

pveuser

New Member
Mar 25, 2023
8
7
3
Hello - we have a number of standalone PVE and also some clusters - all running various different versions of PVE 7


pve-manager/7.3-4/d69b70d4

pve-manager/7.2-11/b76d3178

would be 2 examples.

Last night after pveupdate ran all the systems are pretty locked up with sbin/init churning through 100% of CPI and depending on what was happening on the box either a few or thousands of zombie processes.

All VMS are working - so we don';t want to reboot at the moment -

anyone got any ideas?


root@dub-cwt-pve5:/etc# ps axo stat,ppid,pid,comm | grep -w defunct


Zs 1 2851130 pveupdate <defunct>
Z 1 2851893 systemctl <defunct>
Z 1 2851894 grep <defunct>
Z 1 2851895 awk <defunct>
Z 1 2851896 grep <defunct>
Z 1 2851898 systemctl <defunct>
Z 1 2851899 grep <defunct>
Z 1 2851900 awk <defunct>
Z 1 2851901 grep <defunct>
Z 1 2851903 systemctl <defunct>
<snip>


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 168112 11420 7564 R 100.0 0.0 653:37.74 systemd

The problem is that VMS are running but I can't use any PVE commands to do a migration for example as they will just hang because the OS is borked.
 
  • Like
Reactions: IcarusFalling
Hello - we have a number of standalone PVE and also some clusters - all running various different versions of PVE 7


pve-manager/7.3-4/d69b70d4

pve-manager/7.2-11/b76d3178

would be 2 examples.

Last night after pveupdate ran all the systems are pretty locked up with sbin/init churning through 100% of CPI and depending on what was happening on the box either a few or thousands of zombie processes.

All VMS are working - so we don';t want to reboot at the moment -

anyone got any ideas?


root@dub-cwt-pve5:/etc# ps axo stat,ppid,pid,comm | grep -w defunct


Zs 1 2851130 pveupdate <defunct>
Z 1 2851893 systemctl <defunct>
Z 1 2851894 grep <defunct>
Z 1 2851895 awk <defunct>
Z 1 2851896 grep <defunct>
Z 1 2851898 systemctl <defunct>
Z 1 2851899 grep <defunct>
Z 1 2851900 awk <defunct>
Z 1 2851901 grep <defunct>
Z 1 2851903 systemctl <defunct>
<snip>


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 168112 11420 7564 R 100.0 0.0 653:37.74 systemd

The problem is that VMS are running but I can't use any PVE commands to do a migration for example as they will just hang because the OS is borked.
having the exact same issue right now

top - 17:40:59 up 7 min, 1 user, load average: 1.07, 0.80, 0.40
Tasks: 796 total, 2 running, 790 sleeping, 0 stopped, 4 zombie
%Cpu(s): 0.6 us, 0.7 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 257830.0 total, 256140.2 free, 1501.6 used, 188.2 buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 255116.7 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 164560 10876 7744 R 100.0 0.0 6:58.59 systemd
2748 root 20 0 10844 4520 3232 R 0.7 0.0 0:00.07 top
64 root rt 0 0 0 0 S 0.3 0.0 0:01.96 migration/8
 
Code:
-bash-5.1# journalctl -f
-- Journal begins at Sat 2023-03-25 15:20:30 GMT. --
Mar 25 17:34:05 atlas kernel: vmbr0: port 1(enp4s0) entered forwarding state
Mar 25 17:34:05 atlas kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready
Mar 25 17:34:13 atlas chronyd[2629]: Selected source 85.199.214.99 (2.debian.pool.ntp.org)
Mar 25 17:34:13 atlas chronyd[2629]: System clock TAI offset set to 37 seconds
Mar 25 17:34:14 atlas sshd[2706]: Accepted password for root from 192.168.0.8 port 51422 ssh2
Mar 25 17:34:14 atlas sshd[2706]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 25 17:36:09 atlas systemd-logind[2527]: Failed to start user service 'user@0.service', ignoring: Connection timed out
Mar 25 17:36:34 atlas systemd-logind[2527]: Failed to start session scope session-1.scope: Connection timed out
Mar 25 17:36:34 atlas sshd[2706]: pam_systemd(sshd:session): Failed to create session: Connection timed out
Mar 25 17:37:09 atlas systemd-logind[2527]: Failed to stop user service 'user@0.service', ignoring: Connection timed out
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!