Host reboot does not gracefully shutdown VMs

fabian · Jun 18, 2025

could you check your journal for messages from systemd about breaking cycles?

bergamot · Jun 18, 2025

fabian said:
could you check your journal for messages from systemd about breaking cycles?

yes. What am I looking for specifically? I don't know what this means.

fabian · Jun 18, 2025

I think "journalctl -b --grep cycle" should be enough

bergamot · Jun 18, 2025

Not much coming up:

code_language.shell:

root@pve:~# journalctl -b --grep cycle
Jun 18 07:17:36 pve kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
Jun 18 07:17:36 pve kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x3990bec8342, max_idle_ns: 881590769617 ns
Jun 18 07:17:36 pve kernel: clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
Jun 18 07:17:36 pve kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Jun 18 07:17:36 pve kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x3990bec8342, max_idle_ns: 881590769617 ns

fabian · Jun 18, 2025

could you post "systemd-analyze critical-chain <UNIT>" for pve-guests.service, pveproxy.service and pve-cluster.service and attach (warning, long!) the full output of "systemd-analyze dump '*pve*'"?

bergamot · Jun 18, 2025

Here is the output of systemd-analyze critical-chain <UNIT>

code_language.shell:

root@pve:~# systemd-analyze critical-chain pve-guests.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

pve-guests.service +6min 17.731s
└─pve-ha-lrm.service @5.702s +403ms
  └─pveproxy.service @4.952s +741ms
    └─pvedaemon.service @4.329s +614ms
      └─pve-cluster.service @3.307s +1.010s
        └─rrdcached.service @3.281s +25ms
          └─time-sync.target @3.280s
            └─chrony.service @3.250s +29ms
              └─network.target @3.241s
                └─networking.service @2.229s +1.011s
                  └─local-fs.target @2.218s
                    └─etc-pve.mount @3.317s
                      └─local-fs-pre.target @169ms
                        └─lvm2-monitor.service @136ms +32ms
                          └─systemd-journald.socket @133ms
                            └─system.slice @117ms
                              └─-.slice @117ms

code_language.shell:

root@pve:~# systemd-analyze critical-chain pveproxy.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

pveproxy.service +741ms
└─pvedaemon.service @4.329s +614ms
  └─pve-cluster.service @3.307s +1.010s
    └─rrdcached.service @3.281s +25ms
      └─time-sync.target @3.280s
        └─chrony.service @3.250s +29ms
          └─network.target @3.241s
            └─networking.service @2.229s +1.011s
              └─local-fs.target @2.218s
                └─etc-pve.mount @3.317s
                  └─local-fs-pre.target @169ms
                    └─lvm2-monitor.service @136ms +32ms
                      └─systemd-journald.socket @133ms
                        └─system.slice @117ms
                          └─-.slice @117ms

code_language.shell:

root@pve:~# systemd-analyze critical-chain pve-cluster.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

pve-cluster.service +1.010s
└─rrdcached.service @3.281s +25ms
  └─time-sync.target @3.280s
    └─chrony.service @3.250s +29ms
      └─network.target @3.241s
        └─networking.service @2.229s +1.011s
          └─local-fs.target @2.218s
            └─etc-pve.mount @3.317s
              └─local-fs-pre.target @169ms
                └─lvm2-monitor.service @136ms +32ms
                  └─systemd-journald.socket @133ms
                    └─system.slice @117ms
                      └─-.slice @117ms

bergamot · Jun 18, 2025

Here is the output of systemd-analyze dump '*pve*'

bergamot · Jun 18, 2025

Well I did some debugging with my vm configuration, and I think I found the cause.

I have a vm template which has an ISO cd device attached, and the ISO was on NFS storage. That storage was configured to be disabled, so the template was not able to find the ISO file. When I removed the cd device, or pointed to a path that exists, rebooting causes a graceful shutdown of any running VMS. When I change the template configuration back to pointing to the non-existent NFS storage, it fails to gracefully shutdown vms.

Here is how I can reproduce this:
- create a vm template with a CD/DVD drive device pointing to an ISO file.
- disable the ISO storage under the Datacenter.storage configuration.
- reboot.
- re-enable the ISO storage under the Datacenter.storage configuration.
- reboot

In my case, the host shuts down without gracefully shutting down the VMS when the storage is disabled. Re-enabling the storage allows for graceful shutdown.

gfngfn256 · Jun 18, 2025

bergamot said:
I think I found the cause

Very odd indeed that a seemingly "dormant" VM template pointing to an unconfigured storage (on it's ISO cd alone) - should cause this behavior.

Are you sure that the NFS storage is not also being used elsewhere?

bergamot · Jun 18, 2025

I'm positive. I tested this with a local directory storage as well. Disabling it causes the same issue.

gfngfn256 · Jun 18, 2025

bergamot said:
I'm positive.

Do you possibly have some linked clones from that VM template.

bergamot · Jun 18, 2025

I believe I made all of the VMs as full-clones. The clones also have the ISO cd device removed.

gfngfn256 · Jun 18, 2025

bergamot said:
I believe I made all of the VMs as full-clones.

Maybe as a test - make a completely new VM template - with an ISO cd stored on a storage that you later remove.

fabian · Monday at 11:54

is the VM with the dangling iso reference marked as start on boot as well? I can't reproduce this in any case..

anyhow, could you provide the following for a "good" and a "bad" boot:

- storage.cfg
- VM config
- systemctl status pve-guests (before the reboot attempt)
- "journalctl -b-1 | grep -i -e 'start' -e 'stop'" directly after the reboot

please clearly note which information is for the "good" and which is for the "bad" instances..

bergamot · Monday at 14:12

fabian said:
is the VM with the dangling iso reference marked as start on boot as well? I can't reproduce this in any case..

The template is not, but the VMs cloned from it are. I haven't had much time to drill down into an MRE, but I also have not been able to reproduce the issue when creating a brand new template as gfngfn256 suggested. I can still reproduce the problem with the existing template. It's interesting though, since I re-installed proxmox, none of the VMs or templates were carried over from the previous install, so something happens during the configuration that makes this reproducible. I should have time in the next couple days to work on this.

anyhow, could you provide the following for a "good" and a "bad" boot:

I can definitely get these to you as well.

Search

Search

Host reboot does not gracefully shutdown VMs

fabian

Proxmox Staff Member

bergamot

New Member

fabian

Proxmox Staff Member

bergamot

New Member

fabian

Proxmox Staff Member

bergamot

New Member

bergamot

New Member

Attachments

bergamot

New Member

gfngfn256

Distinguished Member

bergamot

New Member

gfngfn256

Distinguished Member

bergamot

New Member

gfngfn256

Distinguished Member

fabian

Proxmox Staff Member

bergamot

New Member

We value your privacy