Host hard crash when backing up VM with i915 mediated device

robertut

Member
Sep 20, 2022
55
14
13
Have a Proxmox 8.3.3 up to date cluster with 4 nodes running i5-8500T (6 cores), mediated GPU detected and properly configured:

1739358748546.png

The VM is a Windows 11 which detects the GPU corretly and works very well.

I've set up a scheduled for the night backup job to create a backup to a mounted storage, compression ZSTD (fast and good) in Snapshot mode. In most cases this results in a complete hang of the VM. Apparently the VM runs, but it doesn't. Force stopping has no effect.
Had a desperate attempt to reboot the node, it hangs at console with:
1739358290368.png
Only way to recover is to pull the plug.

Tried to change the backup mode to Suspend, same thing happens in most cases.
It's not consistent, sometimes it succeedes, sometimes it doesn't, but mostly it doesn't. It's very annoying because by the time we realise this lockup, we're remote from the site, thus pulling the plug is not immediately feasible. I was expecting the watchdog to reboot the node but it doesn't.

My other issue is that I can't migrate the VM between the nodes unless it's completely shut down. I'm aware that live migration is not possible (even despite using mapped devices) but I was hoping that at least suspend or restart mode would be available as a choice in the Migrate functionality.
 
Last edited:
Happened again, this time been able to catch some more info. The backup notification:

Code:
vzdump 128 --notes-template '{{guestname}}' --fleecing 0 --notification-mode notification-system --storage prox-backup --mode suspend --quiet 1 --prune-backups 'keep-last=3' --compress zstd
128: 2025-02-16 07:00:02 INFO: Starting Backup of VM 128 (qemu)
128: 2025-02-16 07:00:02 INFO: status = running
128: 2025-02-16 07:00:02 INFO: backup mode: suspend
128: 2025-02-16 07:00:02 INFO: ionice priority: 7
128: 2025-02-16 07:10:02 ERROR: Backup of VM 128 failed - VM 128 qmp command 'query-backup' failed - unable to connect to VM 128 qmp socket - timeout after 5965 retries

Since then, the CPU is linearly hogged at 25% (this was the only VM running at the time).

The task details show the same:

Code:
INFO: starting new backup job: vzdump 128 --notes-template '{{guestname}}' --fleecing 0 --notification-mode notification-system --storage prox-backup --mode suspend --quiet 1 --prune-backups 'keep-last=3' --compress zstd
INFO: Starting Backup of VM 128 (qemu)
INFO: Backup started at 2025-02-16 07:00:02
INFO: status = running
INFO: backup mode: suspend
INFO: ionice priority: 7
ERROR: Backup of VM 128 failed - VM 128 qmp command 'query-backup' failed - unable to connect to VM 128 qmp socket - timeout after 5965 retries
INFO: Failed at 2025-02-16 07:10:02
INFO: Backup job finished with errors
TASK ERROR: job errors

The system log is being spammed continuously since then:

Code:
Feb 16 16:21:01 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:21:01 tinypd1 pvestatd[960]: status update time (8.172 seconds)
Feb 16 16:21:11 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:21:11 tinypd1 pvestatd[960]: status update time (8.164 seconds)
Feb 16 16:21:21 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:21:21 tinypd1 pvestatd[960]: status update time (8.162 seconds)
Feb 16 16:21:31 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:21:31 tinypd1 pvestatd[960]: status update time (8.163 seconds)
Feb 16 16:21:41 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:21:41 tinypd1 pvestatd[960]: status update time (8.163 seconds)
Feb 16 16:21:51 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:21:52 tinypd1 pvestatd[960]: status update time (8.164 seconds)
Feb 16 16:22:01 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:22:01 tinypd1 pvestatd[960]: status update time (8.165 seconds)
Feb 16 16:22:11 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:22:11 tinypd1 pvestatd[960]: status update time (8.163 seconds)
Feb 16 16:22:21 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:22:21 tinypd1 pvestatd[960]: status update time (8.166 seconds)
Feb 16 16:22:31 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:22:31 tinypd1 pvestatd[960]: status update time (8.165 seconds)
Feb 16 16:22:41 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:22:41 tinypd1 pvestatd[960]: status update time (8.157 seconds)
Feb 16 16:22:51 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:22:52 tinypd1 pvestatd[960]: status update time (8.164 seconds)
Feb 16 16:23:01 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:23:01 tinypd1 pvestatd[960]: status update time (8.156 seconds)
Feb 16 16:23:11 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:23:11 tinypd1 pvestatd[960]: status update time (8.161 seconds)
Feb 16 16:23:21 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:23:21 tinypd1 pvestatd[960]: status update time (8.160 seconds)
Feb 16 16:23:31 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:23:31 tinypd1 pvestatd[960]: status update time (8.171 seconds)
Feb 16 16:23:41 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:23:41 tinypd1 pvestatd[960]: status update time (8.164 seconds)

The VM appears to be running.

This time I tried to kill it:

Code:
Feb 16 16:26:45 tinypd1 pvedaemon[1181149]: VM 128 qmp command failed - VM 128 qmp command 'quit' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:26:45 tinypd1 pvedaemon[1181149]: VM quit/powerdown failed - terminating now with SIGTERM
Feb 16 16:26:51 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:26:52 tinypd1 pvestatd[960]: status update time (8.161 seconds)
Feb 16 16:26:52 tinypd1 pvedaemon[1084493]: VM 128 qmp command failed - VM 128 qmp command 'guest-ping' failed - got timeout
Feb 16 16:26:55 tinypd1 pvedaemon[984177]: VM 128 qmp command failed - VM 128 qmp command 'query-proxmox-support' failed - unable to connect to VM 128 qmp socket - timeout after 51 retries
Feb 16 16:26:55 tinypd1 pvedaemon[1181149]: VM still running - terminating now with SIGKILL
Feb 16 16:26:56 tinypd1 pvestatd[960]: VM 128 qmp command failed - VM 128 not running

But in the task panels the Stop task keeps running forever, this is what I see in a popup window when I double-click the running task:

Code:
()
VM quit/powerdown failed - terminating now with SIGTERM
VM still running - terminating now with SIGKILL
actively clean up mediated device with UUID 00000000-0000-0000-0000-000000000128

Pressing Stop button here prints a "Please wait" which disappears after a short while. And in the syslog:

Code:
Feb 16 16:29:31 tinypd1 pmxcfs[824]: [dcdb] notice: data verification successful
Feb 16 16:29:55 tinypd1 kernel: intel_vgpu_mdev 00000000-0000-0000-0000-000000000128: Device is currently in use, task "task UPID:tinyp" (1181149) blocked until device is released
Feb 16 16:30:15 tinypd1 pveproxy[1023150]: worker exit
Feb 16 16:30:15 tinypd1 pveproxy[988]: worker 1023150 finished
Feb 16 16:30:15 tinypd1 pveproxy[988]: starting 1 worker(s)
Feb 16 16:30:15 tinypd1 pveproxy[988]: worker 1182155 started
Feb 16 16:31:31 tinypd1 pvedaemon[846588]: worker exit
Feb 16 16:31:31 tinypd1 pvedaemon[976]: worker 846588 finished
Feb 16 16:31:31 tinypd1 pvedaemon[976]: starting 1 worker(s)
Feb 16 16:31:31 tinypd1 pvedaemon[976]: worker 1182400 started

The VM now appears to be stopped, starting is not possible:

Code:
()
trying to acquire lock...
TASK ERROR: can't lock file '/var/lock/qemu-server/lock-128.conf' - got timeout
 
Last edited:
Trying to reboot the host now:
Code:
Feb 16 16:35:41 tinypd1 systemd-logind[623]: The system will reboot now!
Feb 16 16:35:41 tinypd1 systemd-logind[623]: System is rebooting.
Feb 16 16:35:41 tinypd1 systemd[1]: 128.scope: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped 128.scope.
Feb 16 16:35:41 tinypd1 systemd[1]: 128.scope: Consumed 1d 13h 20min 4.058s CPU time.
Feb 16 16:35:41 tinypd1 systemd[1]: Removed slice qemu.slice - Slice /qemu.
Feb 16 16:35:41 tinypd1 systemd[1]: qemu.slice: Consumed 1d 15h 54min 28.359s CPU time.
Feb 16 16:35:41 tinypd1 systemd[1]: Removed slice system-modprobe.slice - Slice /system/modprobe.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target graphical.target - Graphical Interface.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target multi-user.target - Multi-User System.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target getty.target - Login Prompts.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target rpc_pipefs.target.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target rpcbind.target - RPC Port Mapper.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target timers.target - Timer Units.
Feb 16 16:35:41 tinypd1 systemd[1]: apt-daily-upgrade.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped apt-daily-upgrade.timer - Daily apt upgrade and clean activities.
Feb 16 16:35:41 tinypd1 systemd[1]: apt-daily.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped apt-daily.timer - Daily apt download activities.
Feb 16 16:35:41 tinypd1 systemd[1]: dpkg-db-backup.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped dpkg-db-backup.timer - Daily dpkg database backup timer.
Feb 16 16:35:41 tinypd1 systemd[1]: e2scrub_all.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped e2scrub_all.timer - Periodic ext4 Online Metadata Check for All Filesystems.
Feb 16 16:35:41 tinypd1 systemd[1]: fstrim.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped fstrim.timer - Discard unused blocks once a week.
Feb 16 16:35:41 tinypd1 systemd[1]: logrotate.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped logrotate.timer - Daily rotation of log files.
Feb 16 16:35:41 tinypd1 systemd[1]: man-db.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped man-db.timer - Daily man-db regeneration.
Feb 16 16:35:41 tinypd1 systemd[1]: pve-daily-update.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped pve-daily-update.timer - Daily PVE download activities.
Feb 16 16:35:41 tinypd1 systemd[1]: systemd-tmpfiles-clean.timer: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped systemd-tmpfiles-clean.timer - Daily Cleanup of Temporary Directories.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target zfs.target - ZFS startup target.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target zfs-import.target - ZFS pool import target.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped target zfs-volumes.target - ZFS volumes are ready.
Feb 16 16:35:41 tinypd1 systemd[1]: lvm2-lvmpolld.socket: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Closed lvm2-lvmpolld.socket - LVM2 poll daemon socket.
Feb 16 16:35:41 tinypd1 systemd[1]: systemd-rfkill.socket: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Closed systemd-rfkill.socket - Load/Save RF Kill Switch Status /dev/rfkill Watch.
Feb 16 16:35:41 tinypd1 systemd[1]: Unmounting run-rpc_pipefs.mount - RPC Pipe File System...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping blk-availability.service - Availability of block devices...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping cron.service - Regular background program processing daemon...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping dbus.service - D-Bus System Message Bus...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping getty@tty1.service - Getty on tty1...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping ksmtuned.service - Kernel Samepage Merging (KSM) Tuning Daemon...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping lxc-monitord.service - LXC Container Monitoring Daemon...
Feb 16 16:35:41 tinypd1 systemd[1]: postfix.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped postfix.service - Postfix Mail Transport Agent.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping postfix@-.service - Postfix Mail Transport Agent (instance -)...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping proxmox-firewall.service - Proxmox nftables firewall...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping pvescheduler.service - Proxmox VE scheduler...
Feb 16 16:35:41 tinypd1 blkdeactivate[1183507]: Deactivating block devices:
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping systemd-logind.service - User Login Management...
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping systemd-random-seed.service - Load/Save Random Seed...
Feb 16 16:35:41 tinypd1 systemd[1]: zfs-share.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped zfs-share.service - ZFS file system shares.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping zfs-zed.service - ZFS Event Daemon (zed)...
Feb 16 16:35:41 tinypd1 systemd[1]: dbus.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped dbus.service - D-Bus System Message Bus.
Feb 16 16:35:41 tinypd1 zed[628]: Exiting
Feb 16 16:35:41 tinypd1 systemd[1]: ksmtuned.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 smartd[614]: smartd received signal 15: Terminated
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped ksmtuned.service - Kernel Samepage Merging (KSM) Tuning Daemon.
Feb 16 16:35:41 tinypd1 systemd[1]: ksmtuned.service: Consumed 5min 57.162s CPU time.
Feb 16 16:35:41 tinypd1 systemd[1]: getty@tty1.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped getty@tty1.service - Getty on tty1.
Feb 16 16:35:41 tinypd1 blkdeactivate[1183507]:   [SKIP]: unmount of pve-swap (dm-0) mounted on [SWAP]
Feb 16 16:35:41 tinypd1 blkdeactivate[1183507]:   [SKIP]: unmount of pve-root (dm-1) mounted on /
Feb 16 16:35:41 tinypd1 smartd[614]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLW256HEHP_000H1-S340NX0K965494.nvme.state
Feb 16 16:35:41 tinypd1 smartd[614]: smartd is exiting (exit status 0)
Feb 16 16:35:41 tinypd1 postfix[1183520]: Postfix is using backwards-compatible default settings
Feb 16 16:35:41 tinypd1 postfix[1183520]: See http://www.postfix.org/COMPATIBILITY_README.html for details
Feb 16 16:35:41 tinypd1 postfix[1183520]: To disable backwards compatibility use "postconf compatibility_level=3.6" and "postfix reload"
Feb 16 16:35:41 tinypd1 systemd[1]: lxc-monitord.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped lxc-monitord.service - LXC Container Monitoring Daemon.
Feb 16 16:35:41 tinypd1 systemd[1]: cron.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped cron.service - Regular background program processing daemon.
Feb 16 16:35:41 tinypd1 systemd[1]: cron.service: Consumed 26.075s CPU time.
Feb 16 16:35:41 tinypd1 systemd[1]: zfs-zed.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped zfs-zed.service - ZFS Event Daemon (zed).
Feb 16 16:35:41 tinypd1 systemd[1]: run-rpc_pipefs.mount: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Unmounted run-rpc_pipefs.mount - RPC Pipe File System.
Feb 16 16:35:41 tinypd1 systemd[1]: systemd-random-seed.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped systemd-random-seed.service - Load/Save Random Seed.
Feb 16 16:35:41 tinypd1 systemd[1]: Removed slice system-getty.slice - Slice /system/getty.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping systemd-user-sessions.service - Permit User Sessions...
Feb 16 16:35:41 tinypd1 systemd[1]: smartmontools.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.
Feb 16 16:35:41 tinypd1 systemd[1]: systemd-logind.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped systemd-logind.service - User Login Management.
Feb 16 16:35:41 tinypd1 systemd[1]: systemd-user-sessions.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped systemd-user-sessions.service - Permit User Sessions.
Feb 16 16:35:41 tinypd1 postfix/postfix-script[1183547]: stopping the Postfix mail system
Feb 16 16:35:41 tinypd1 postfix/master[921]: terminating on signal 15
Feb 16 16:35:41 tinypd1 systemd[1]: postfix@-.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped postfix@-.service - Postfix Mail Transport Agent (instance -).
Feb 16 16:35:41 tinypd1 systemd[1]: postfix@-.service: Consumed 1.862s CPU time.
Feb 16 16:35:41 tinypd1 systemd[1]: Removed slice system-postfix.slice - Slice /system/postfix.
Feb 16 16:35:41 tinypd1 systemd[1]: system-postfix.slice: Consumed 1.862s CPU time.
Feb 16 16:35:41 tinypd1 systemd[1]: blk-availability.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped blk-availability.service - Availability of block devices.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopping rbdmap.service - Map RBD devices...
Feb 16 16:35:41 tinypd1 systemd[1]: rbdmap.service: Deactivated successfully.
Feb 16 16:35:41 tinypd1 systemd[1]: Stopped rbdmap.service - Map RBD devices.
Feb 16 16:35:41 tinypd1 pvescheduler[1151]: received signal TERM
Feb 16 16:35:41 tinypd1 pvescheduler[1151]: got shutdown request, signal running jobs to stop
Feb 16 16:35:41 tinypd1 pvescheduler[1151]: server stopped
Feb 16 16:35:42 tinypd1 systemd[1]: pvescheduler.service: Deactivated successfully.
Feb 16 16:35:42 tinypd1 systemd[1]: Stopped pvescheduler.service - Proxmox VE scheduler.
Feb 16 16:35:42 tinypd1 systemd[1]: pvescheduler.service: Consumed 12min 7.738s CPU time.
Feb 16 16:35:42 tinypd1 systemd[1]: Stopping pve-guests.service - PVE guests...
Feb 16 16:35:44 tinypd1 pve-guests[1183577]: <root@pam> starting task UPID:tinypd1:00120F6A:0206607F:67B205D0:stopall::root@pam:
Feb 16 16:35:44 tinypd1 systemd[1]: proxmox-firewall.service: Deactivated successfully.
Feb 16 16:35:44 tinypd1 systemd[1]: Stopped proxmox-firewall.service - Proxmox nftables firewall.
Feb 16 16:35:44 tinypd1 systemd[1]: proxmox-firewall.service: Consumed 4min 45.086s CPU time.
Feb 16 16:35:44 tinypd1 pve-guests[1183594]: all VMs and CTs stopped
Feb 16 16:35:44 tinypd1 pve-guests[1183577]: <root@pam> end task UPID:tinypd1:00120F6A:0206607F:67B205D0:stopall::root@pam: OK
Feb 16 16:35:44 tinypd1 systemd[1]: pve-guests.service: Deactivated successfully.
Feb 16 16:35:44 tinypd1 systemd[1]: Stopped pve-guests.service - PVE guests.
Feb 16 16:35:44 tinypd1 systemd[1]: pve-guests.service: Consumed 1.697s CPU time.
Feb 16 16:35:44 tinypd1 systemd[1]: Stopping pve-firewall.service - Proxmox VE firewall...
Feb 16 16:35:44 tinypd1 systemd[1]: Stopping pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
Feb 16 16:35:44 tinypd1 systemd[1]: Stopping pve-lxc-syscalld.service - Proxmox VE LXC Syscall Daemon...
Feb 16 16:35:44 tinypd1 systemd[1]: Stopping pvestatd.service - PVE Status Daemon...
Feb 16 16:35:44 tinypd1 systemd[1]: Stopping spiceproxy.service - PVE SPICE Proxy Server...
Feb 16 16:35:44 tinypd1 systemd[1]: pve-lxc-syscalld.service: Deactivated successfully.
Feb 16 16:35:44 tinypd1 systemd[1]: Stopped pve-lxc-syscalld.service - Proxmox VE LXC Syscall Daemon.
Feb 16 16:35:45 tinypd1 spiceproxy[994]: received signal TERM
Feb 16 16:35:45 tinypd1 spiceproxy[994]: server closing
Feb 16 16:35:45 tinypd1 spiceproxy[984603]: worker exit
Feb 16 16:35:45 tinypd1 spiceproxy[994]: worker 984603 finished
Feb 16 16:35:45 tinypd1 spiceproxy[994]: server stopped
Feb 16 16:35:45 tinypd1 pve-firewall[950]: received signal TERM
Feb 16 16:35:45 tinypd1 pve-firewall[950]: server shutting down
Feb 16 16:35:45 tinypd1 pve-firewall[950]: clear PVE-generated firewall rules
Feb 16 16:35:45 tinypd1 pvestatd[960]: received signal TERM
Feb 16 16:35:45 tinypd1 pvestatd[960]: server closing
Feb 16 16:35:45 tinypd1 pvestatd[960]: server stopped
Feb 16 16:35:45 tinypd1 pve-firewall[950]: server stopped
Feb 16 16:35:45 tinypd1 pve-ha-lrm[996]: received signal TERM
Feb 16 16:35:45 tinypd1 pve-ha-lrm[996]: got shutdown request with shutdown policy 'conditional'
Feb 16 16:35:45 tinypd1 pve-ha-lrm[996]: reboot LRM, stop and freeze all services
Feb 16 16:35:45 tinypd1 pve-ha-lrm[996]: server stopped
Feb 16 16:35:46 tinypd1 systemd[1]: spiceproxy.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped spiceproxy.service - PVE SPICE Proxy Server.
Feb 16 16:35:46 tinypd1 systemd[1]: spiceproxy.service: Consumed 12.176s CPU time.
Feb 16 16:35:46 tinypd1 systemd[1]: pve-firewall.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped pve-firewall.service - Proxmox VE firewall.
Feb 16 16:35:46 tinypd1 systemd[1]: pve-firewall.service: Consumed 37min 56.944s CPU time.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall logger...
Feb 16 16:35:46 tinypd1 pvefw-logger[984597]: received terminate request (signal)
Feb 16 16:35:46 tinypd1 pvefw-logger[984597]: stopping pvefw logger
Feb 16 16:35:46 tinypd1 systemd[1]: pvestatd.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped pvestatd.service - PVE Status Daemon.
Feb 16 16:35:46 tinypd1 systemd[1]: pvestatd.service: Consumed 56min 6.648s CPU time.
Feb 16 16:35:46 tinypd1 systemd[1]: pve-ha-lrm.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
Feb 16 16:35:46 tinypd1 systemd[1]: pve-ha-lrm.service: Consumed 34.565s CPU time.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopping lxc.service - LXC Container Initialization and Autoboot Code...
Feb 16 16:35:46 tinypd1 systemd[1]: Stopping pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon...
Feb 16 16:35:46 tinypd1 systemd[1]: pve-query-machine-capabilities.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped pve-query-machine-capabilities.service - PVE Query Machine Capabilities.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopping pveproxy.service - PVE API Proxy Server...
Feb 16 16:35:46 tinypd1 systemd[1]: Stopping qmeventd.service - PVE Qemu Event Daemon...
Feb 16 16:35:46 tinypd1 systemd[1]: qmeventd.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped qmeventd.service - PVE Qemu Event Daemon.
Feb 16 16:35:46 tinypd1 systemd[1]: qmeventd.service: Consumed 4.008s CPU time.
Feb 16 16:35:46 tinypd1 systemd[1]: lxc.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped lxc.service - LXC Container Initialization and Autoboot Code.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopping lxc-net.service - LXC network bridge setup...
Feb 16 16:35:46 tinypd1 systemd[1]: Stopping lxcfs.service - FUSE filesystem for LXC...
Feb 16 16:35:46 tinypd1 systemd[1]: var-lib-lxcfs.mount: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Unmounted var-lib-lxcfs.mount - /var/lib/lxcfs.
Feb 16 16:35:46 tinypd1 systemd[1]: lxc-net.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped lxc-net.service - LXC network bridge setup.
Feb 16 16:35:46 tinypd1 lxcfs[631]: Running destructor lxcfs_exit
Feb 16 16:35:46 tinypd1 systemd[1]: lxcfs.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 16:35:46 tinypd1 fusermount[1183631]: /bin/fusermount: failed to unmount /var/lib/lxcfs: Invalid argument
Feb 16 16:35:46 tinypd1 systemd[1]: lxcfs.service: Failed with result 'exit-code'.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped lxcfs.service - FUSE filesystem for LXC.
Feb 16 16:35:46 tinypd1 systemd[1]: pvefw-logger.service: Deactivated successfully.
Feb 16 16:35:46 tinypd1 systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall logger.
Feb 16 16:35:46 tinypd1 systemd[1]: pvefw-logger.service: Consumed 2.583s CPU time.
Logs
 
Looking at the physical console only a blinking cursor can be seen at top left of the screen. No reaction at keypresses nor Ctrl+Alt+Del.

Not possible to remotely log in either via SSH, but the machine responds to pings. In the web UI of the cluster (looking now through another node) clicking on anything shows "Connection refused" and all the items in the tree switch to grey questionmark first, and then red X later.

After about 5 minutes the msessages shown in the screenshot above start to slowly appear on the physical console. Only way to get out of this is to cut the power of the server.