if reboot is triggered pve node goes away too fast before ha migration is finished

tvtue · Aug 25, 2025

With pve 9.0.5 when configuring a vm as a ha-resource with status "started" and then doing a reboot of the underlying hypervisor, I noticed the following error message

Code:

task started by HA resource agent
2025-08-25 15:18:18 conntrack state migration not supported or disabled, active connections might get dropped
2025-08-25 15:18:18 starting migration of VM 101 to node 'sm01a' (10.27.33.1)
2025-08-25 15:18:18 starting VM 101 on remote node 'sm01a'
2025-08-25 15:18:21 start remote tunnel
2025-08-25 15:18:21 ssh tunnel ver 1
2025-08-25 15:18:21 starting online/live migration on unix:/run/qemu-server/101.migrate
2025-08-25 15:18:21 set migration capabilities
2025-08-25 15:18:21 migration downtime limit: 100 ms
2025-08-25 15:18:21 migration cachesize: 1.0 GiB
2025-08-25 15:18:21 set migration parameters
2025-08-25 15:18:21 start migrate command to unix:/run/qemu-server/101.migrate
2025-08-25 15:18:22 average migration speed: 8.0 GiB/s - downtime 6 ms
2025-08-25 15:18:22 migration completed, transferred 20.7 MiB VM-state
2025-08-25 15:18:22 migration status: completed
2025-08-25 15:18:24 ERROR: Cleanup after stopping VM failed - org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2025-08-25 15:18:25 ERROR: migration finished with problems (duration 00:00:07)
TASK ERROR: migration problems

First I thought it has to do with the nf_conntrack kernel modul, which others have reported about, but I am already on version 9.0.18 with qemu-server.

When I do a manual migration it works fine. Also when I put the node in maintenance via cli command (ha-manager crm-command node-maintenance enable nodename) the migration works as expected and without error. So I suspect that the node goes away too fast in order to complete all things, that would be done normally with a migration.

Cheers,
Timo

tdennis · Aug 30, 2025

I got the same error messages when updating from 9.0.5 to 9.0.6.
migration status is completed, but Cleanup after stopping VM failed errors as above.

nocman · Aug 30, 2025

Same problem after upgrade 8.x to 9.0.6
Occurs only on clusternode shutdown or reboot.

rkollman · Sep 1, 2025

I experience the same issue. Only on reboot via CLI or via the PVE manager interface.
Normal migration of the same VM's works alright.

So I think what's already said, the node goes away to fast...

SomeGuy42 · Sep 4, 2025

I experienced the same issue. Applied patches and ran reboot from the command line. Usually the VMs migrate off and the system reboots. All sorts of problems this time with VM migrations failing and VMs getting rebooted.

Seems like we'll have to put hosts in Maintenance Mode before applying patches in future.

mn234 · Sep 6, 2025

Same problem here right after upgrading to PVE 9.

kameechewa · Sep 28, 2025

Same issue on a fresh install on PVE 9.0.10. Initiated node shutdown and VM successfully migrated with error.

task started by HA resource agent
2025-09-27 15:45:30 conntrack state migration not supported or disabled, active connections might get dropped
2025-09-27 15:45:30 use dedicated network address for sending migration traffic (10.0.2.74)
2025-09-27 15:45:30 starting migration of VM 2191 to node 'node04' (10.0.2.74)
2025-09-27 15:45:30 starting VM 2191 on remote node 'node04'
2025-09-27 15:45:31 start remote tunnel
2025-09-27 15:45:31 ssh tunnel ver 1
2025-09-27 15:45:31 starting online/live migration on unix:/run/qemu-server/2191.migrate
2025-09-27 15:45:31 set migration capabilities
2025-09-27 15:45:31 migration downtime limit: 100 ms
2025-09-27 15:45:31 migration cachesize: 1.0 GiB
2025-09-27 15:45:31 set migration parameters
2025-09-27 15:45:31 start migrate command to unix:/run/qemu-server/2191.migrate
2025-09-27 15:45:58 average migration speed: 304.2 MiB/s - downtime 45 ms
2025-09-27 15:45:58 migration completed, transferred 7.4 GiB VM-state
2025-09-27 15:45:58 migration status: completed
2025-09-27 15:46:01 ERROR: Cleanup after stopping VM failed - org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2025-09-27 15:46:01 ERROR: migration finished with problems (duration 00:00:31)
TASK ERROR: migration problems

Detailed logs are attached.

dusatvoj · Oct 22, 2025

Same issue on fresh 9.0.10.
It's pretty annoying - it was seamless and non-disruptive - now it's not.

My testbed was accidentally touching hypervisor's power button in DC and triggering a reboot of tenths of VMs ... lol

luckman212 · Nov 16, 2025

Just adding a data point: I also experience this error from time to time on my 9.0.11 cluster

alexander@h · Nov 17, 2025

On every reboot of our 5 node cluster, we have this problem every single time ! We are on a 200G Network with very fast hardware

strausmann · Nov 19, 2025

We use Proxmox version 9.0.18 and unfortunately cannot compare it with previous versions.

Here is the task, when rebooting the node, move it to another node:

task started by HA resource agent
2025-11-19 12:37:22 conntrack state migration not supported or disabled, active connections might get dropped
2025-11-19 12:37:22 use dedicated network address for sending migration traffic (10.10.50.224)
2025-11-19 12:37:22 starting migration of VM 125 to node 'PVE224' (10.10.50.224)
2025-11-19 12:37:22 starting VM 125 on remote node 'PVE224'
2025-11-19 12:37:26 start remote tunnel
2025-11-19 12:37:27 ssh tunnel ver 1
2025-11-19 12:37:27 starting online/live migration on unix:/run/qemu-server/125.migrate
2025-11-19 12:37:27 set migration capabilities
2025-11-19 12:37:27 migration downtime limit: 100 ms
2025-11-19 12:37:27 migration cachesize: 1.0 GiB
2025-11-19 12:37:27 set migration parameters
2025-11-19 12:37:27 start migrate command to unix:/run/qemu-server/125.migrate
2025-11-19 12:37:28 migration active, transferred 807.3 MiB of 10.0 GiB VM-state, 4.0 GiB/s
2025-11-19 12:37:29 average migration speed: 5.0 GiB/s - downtime 39 ms
2025-11-19 12:37:29 migration completed, transferred 1.0 GiB VM-state
2025-11-19 12:37:29 migration status: completed
2025-11-19 12:37:31 ERROR: Cleanup after stopping VM failed - org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2025-11-19 12:37:32 ERROR: migration finished with problems (duration 00:00:10)
TASK ERROR: migration problems

Here is the task when the VM is moved back to the original node after the node reboot.

task started by HA resource agent
2025-11-19 12:41:37 conntrack state migration not supported or disabled, active connections might get dropped
2025-11-19 12:41:37 use dedicated network address for sending migration traffic (10.10.50.223)
2025-11-19 12:41:37 starting migration of VM 125 to node 'PVE223' (10.10.50.223)
2025-11-19 12:41:37 starting VM 125 on remote node 'PVE223'
2025-11-19 12:41:40 start remote tunnel
2025-11-19 12:41:40 ssh tunnel ver 1
2025-11-19 12:41:40 starting online/live migration on unix:/run/qemu-server/125.migrate
2025-11-19 12:41:40 set migration capabilities
2025-11-19 12:41:40 migration downtime limit: 100 ms
2025-11-19 12:41:40 migration cachesize: 1.0 GiB
2025-11-19 12:41:40 set migration parameters
2025-11-19 12:41:40 start migrate command to unix:/run/qemu-server/125.migrate
2025-11-19 12:41:41 migration active, transferred 757.4 MiB of 10.0 GiB VM-state, 1.9 GiB/s
2025-11-19 12:41:42 migration active, transferred 1.0 GiB of 10.0 GiB VM-state, 8.9 GiB/s
2025-11-19 12:41:43 average migration speed: 3.3 GiB/s - downtime 38 ms
2025-11-19 12:41:43 migration completed, transferred 1.0 GiB VM-state
2025-11-19 12:41:43 migration status: completed
2025-11-19 12:41:47 migration finished successfully (duration 00:00:11)
TASK OK

It appears that when the node was shut down, something that is definitely available after rollback had already been terminated.

Best regards

Bjoern

bkry · Nov 28, 2025

Link to the Bugzilla report for this error:
https://bugzilla.proxmox.com/show_bug.cgi?id=7092

aluisell · Dec 6, 2025

Same exact problem here with latest rev. 9.1.1. By the way I got exact same error after luanching qm migrate from an old node version 8.1 to migrate vm to a new cluster (this one with 9.1.1)... I was initially thinking it was due to EFI disk and TPM disk but was wrong also got on vm with seabios.... I'm not understanding it but it's pretty disappointing.
Regards
Andrea

Madozu · Dec 16, 2025

I am facing the same issue with PVE version 9.1.2 in a HA cluster whenever I want to shutdown or reboot a host for maintenance. Therefore I would like to share a workaround when the problem happens and a way to avoid the problem. Both worked fine for me.

During a host shutdown with HA policy of "migrate": In that scenario, the VMs with HA setup will fail migration and because those VMs are not migrated to a different host, the host does not shut down or reboot (you will se a lot of "Migration failed" entries in the tasks log.
Workaround: In "Datacenter -> HA" change the HA state to "stopped", the VM will gracefully shut down, followed by a host shutdown/reboot. After the host is up again, change the HA state to "started" and the VM starts.
To avoid the problem, this preparation before host reboot does the trick: Change "Datacenter -> HA -> Affinity rules" to force the migration of the VMs to another host. When migration has completed, shutdown/restart of the host works fine.

I hope this helps

SteveITS · Dec 16, 2025

Another way to migrate all VMs (and have them migrate back to the same server) is to enable maintenance mode. On any server run:

ha-manager crm-command node-maintenance enable pve1
(wait for migration, then reboot)
ha-manager crm-command node-maintenance disable pve1

fiona · Dec 17, 2025

Hi,
thank you for the reports! Initial patches were proposed on the mailing list: https://lore.proxmox.com/pve-devel/20251217131724.118681-1-f.ebner@proxmox.com/

fabian · Dec 17, 2025

qemu-server 9.1.3 available on pve-test now should fix this issue. affected VMs do need to be stopped and started (or live-migrated *outside of a node reboot/shutdown!) for the fix to take affect.

Madozu · Dec 19, 2025

fabian said:
qemu-server 9.1.3 available on pve-test now should fix this issue. affected VMs do need to be stopped and started (or live-migrated *outside of a node reboot/shutdown!) for the fix to take affect.

I can confirm that with the qemu-server 9.1.3 installed, the problem is solved. Tested with PVE 9.1.2 and the updated qemu-server package. Many thanks for the fix

if reboot is triggered pve node goes away too fast before ha migration is finished

New Member

Member

Member

New Member

New Member

Renowned Member

New Member

Attachments

Member

Renowned Member

Active Member

Renowned Member

Member

Active Member

Member

Renowned Member

Proxmox Staff Member

Proxmox Staff Member

Member

We value your privacy