Hello - we have a number of standalone PVE and also some clusters - all running various different versions of PVE 7
pve-manager/7.3-4/d69b70d4
pve-manager/7.2-11/b76d3178
would be 2 examples.
Last night after pveupdate ran all the systems are pretty locked up with sbin/init churning through 100% of CPI and depending on what was happening on the box either a few or thousands of zombie processes.
All VMS are working - so we don';t want to reboot at the moment -
anyone got any ideas?
root@dub-cwt-pve5:/etc# ps axo stat,ppid,pid,comm | grep -w defunct
Zs 1 2851130 pveupdate <defunct>
Z 1 2851893 systemctl <defunct>
Z 1 2851894 grep <defunct>
Z 1 2851895 awk <defunct>
Z 1 2851896 grep <defunct>
Z 1 2851898 systemctl <defunct>
Z 1 2851899 grep <defunct>
Z 1 2851900 awk <defunct>
Z 1 2851901 grep <defunct>
Z 1 2851903 systemctl <defunct>
<snip>
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 168112 11420 7564 R 100.0 0.0 653:37.74 systemd
The problem is that VMS are running but I can't use any PVE commands to do a migration for example as they will just hang because the OS is borked.
pve-manager/7.3-4/d69b70d4
pve-manager/7.2-11/b76d3178
would be 2 examples.
Last night after pveupdate ran all the systems are pretty locked up with sbin/init churning through 100% of CPI and depending on what was happening on the box either a few or thousands of zombie processes.
All VMS are working - so we don';t want to reboot at the moment -
anyone got any ideas?
root@dub-cwt-pve5:/etc# ps axo stat,ppid,pid,comm | grep -w defunct
Zs 1 2851130 pveupdate <defunct>
Z 1 2851893 systemctl <defunct>
Z 1 2851894 grep <defunct>
Z 1 2851895 awk <defunct>
Z 1 2851896 grep <defunct>
Z 1 2851898 systemctl <defunct>
Z 1 2851899 grep <defunct>
Z 1 2851900 awk <defunct>
Z 1 2851901 grep <defunct>
Z 1 2851903 systemctl <defunct>
<snip>
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 168112 11420 7564 R 100.0 0.0 653:37.74 systemd
The problem is that VMS are running but I can't use any PVE commands to do a migration for example as they will just hang because the OS is borked.