Host reboot does not gracefully shutdown VMs

bergamot

New Member
Oct 3, 2023
14
0
1
Recently, I've noticed that reboots and shutdowns of the host are not gracefully shutting down VMs. They are stopped almost immediately.

There are several errors in the logs, but I do not know how to interpret them. I'd appreciate any help in debugging this issue!

PVE Version: 8.3.5
Kernel: 6.11.11-2-pve

Select logs:
Code:
Mar 25 17:50:03 pve pve-firewall[1652]: status update error: Connection refused
Mar 25 17:50:03 pve pve-firewall[1652]: firewall update time (10.051 seconds)
Mar 25 17:50:03 pve pve-firewall[1652]: status update error: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:04 pve pveproxy[681696]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:04 pve vzdump[703974]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:04 pve vzdump[703974]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:04 pve vzdump[703974]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:04 pve vzdump[703974]: Unable to load access control list: Connection refused
Mar 25 17:50:05 pve pveproxy[698540]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:05 pve pveproxy[698540]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:05 pve pveproxy[698540]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:05 pve pvesh[703985]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:05 pve pvesh[703985]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:05 pve pvesh[703985]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:05 pve pvesh[703985]: Unable to load access control list: Connection refused
Mar 25 17:50:05 pve pveproxy[681696]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:05 pve pveproxy[681696]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:05 pve pveproxy[681696]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:05 pve systemd[1]: pve-guests.service: Control process exited, code=exited, status=111/n/a
Mar 25 17:50:05 pve systemd[1]: pve-guests.service: Failed with result 'exit-code'
...
Mar 25 17:50:05 pve spiceproxy[1691]: received signal TERM
Mar 25 17:50:05 pve spiceproxy[1691]: server closing
Mar 25 17:50:05 pve spiceproxy[386108]: worker exit
Mar 25 17:50:05 pve spiceproxy[1691]: worker 386108 finished
Mar 25 17:50:05 pve spiceproxy[1691]: server stopped
Mar 25 17:50:05 pve pve-firewall[703987]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:05 pve pve-firewall[703987]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:05 pve pve-firewall[703987]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:05 pve pve-firewall[703987]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:05 pve pve-firewall[703987]: Unable to load access control list: Connection refused
Mar 25 17:50:05 pve pve-firewall[703987]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:05 pve pve-firewall[703987]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:05 pve pvestatd[703988]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:05 pve pvestatd[703988]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:05 pve pvestatd[703988]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:05 pve pvestatd[703988]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:05 pve pvestatd[703988]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:05 pve pvestatd[703988]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:05 pve pvestatd[703988]: Unable to load access control list: Connection refused
Mar 25 17:50:05 pve systemd[1]: pve-firewall.service: Control process exited, code=exited, status=111/n/a
...
Mar 25 17:50:05 pve pve-firewall[1652]: received signal TERM
Mar 25 17:50:05 pve pve-firewall[1652]: server shutting down
Mar 25 17:50:05 pve pve-firewall[1652]: clear PVE-generated firewall rules
Mar 25 17:50:05 pve systemd[1]: pvestatd.service: Control process exited, code=exited, status=111/n/a
...
Mar 25 17:50:05 pve pve-firewall[1652]: server stopped
Mar 25 17:50:05 pve systemd[1]: pve-firewall.service: Failed with result 'exit-code'
...
Mar 25 17:50:06 pve pveproxy[687075]: ipcc_send_rec[1] failed: Connection refused
Mar 25 17:50:06 pve pveproxy[687075]: ipcc_send_rec[2] failed: Connection refused
Mar 25 17:50:06 pve pveproxy[687075]: ipcc_send_rec[3] failed: Connection refused
Mar 25 17:50:06 pve systemd[1]: spiceproxy.service: Deactivated successfully.
...
Mar 25 17:50:07 pve pveproxy[1686]: received signal TERM
Mar 25 17:50:07 pve pveproxy[1686]: server closing
Mar 25 17:50:07 pve pveproxy[698540]: worker exit
Mar 25 17:50:07 pve pveproxy[687075]: worker exit
Mar 25 17:50:07 pve pveproxy[1686]: worker 687075 finished
Mar 25 17:50:07 pve pveproxy[1686]: worker 681696 finished
Mar 25 17:50:07 pve pveproxy[1686]: worker 698540 finished
Mar 25 17:50:07 pve pveproxy[1686]: server stopped
Mar 25 17:50:07 pve pvestatd[1664]: received signal TERM
Mar 25 17:50:07 pve pvestatd[1664]: server closing
Mar 25 17:50:07 pve pvestatd[1664]: server stopped
Mar 25 17:50:07 pve systemd[1]: pvestatd.service: Failed with result 'exit-code'.
...
Mar 25 17:50:09 pve systemd[1]: Shutting down.
Mar 25 17:50:09 pve systemd-shutdown[1]: Syncing filesystems and block devices.
Mar 25 17:50:10 pve systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Mar 25 17:50:10 pve systemd-journald[370]: Received SIGTERM from PID 1 (systemd-shutdow).
Mar 25 17:50:10 pve QEMU[1707]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[2932]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[2087]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[2540]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[2726]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[2341]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[3873]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[3635]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[3162]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve QEMU[3393]: kvm: terminating on signal 15 from pid 1 (/lib/systemd/systemd-shutdown)
Mar 25 17:50:10 pve systemd-journald[370]: Journal stopped
 
The VMs should get a shutdown signal, either via the guest agent or ACPI. The default timeout for them to shutdown is a few minutes before they get killed.
Is the "Shutdown Timeout" configured to a short amount of time maybe? VM → Options → Start/Shutdown Order.
 
After another reboot attempt, I'm receiving these log errors in addition to the ones I previously shared:
Code:
Mar 26 08:41:38 pve systemd[1]: Failed unmounting run-user-0.mount - /run/user/0
...
Mar 26 08:41:55 pve QEMU[3372]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[2771]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[1708]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[2381]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[21857]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[2123]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[21443]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[3592]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[3166]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:55 pve QEMU[2965]: kvm: Unable to connect character device qmp-event: Failed to connect to '/var/run/qmeventd.sock': Connection refused
Mar 26 08:41:57 pve pvestatd[1666]: received signal TERM
Mar 26 08:41:57 pve pvestatd[1666]: server closing
Mar 26 08:41:57 pve pvestatd[1666]: server stopped
Mar 26 08:41:57 pve systemd[1]: pvestatd.service: Failed with result 'exit-code'.
 
I'm still facing this issue, and I'd be interested in whether there's a solution other than reinstalling proxmox
 
Hey, sorry. Yeah those logs look like something is off.

Were any modifications done to the host like any custom scripts or so? You could try to install the latest updates.

If that doesn't help, you could try to verify that the installed files still are as they should be:
Code:
apt install debsums
debsums -s
If there is no output, then things are good.
 
Running `debsums -s` returns no output.
So things are likely to be okay. But there could be other factors outside of the packages that interfere here.

Otherwise, on the host, I am running node-exporter and prometheus-node-exporter
Can you disable or remove it temporarily?

Overall, this is not something I have seen recently to be a general problem. And this is a modified host, which makes troubleshooting a bit harder. Especially to figure out if something is going wrong on the Proxmox VE side or if something installed afterwards is interfering :-/
 
Tested without node exporter running, and the issue persists. Here are complete logs from boot to shutdown.
Note: reboot starts at line 3538, and errors start appearing at line 4808
 

Attachments

Last edited: