Auto Migration doesn't work when using "reboot" button in webgui

Dennigma

Member
Jun 22, 2023
10
5
8
So today I updated all 3 nodes in my cluster, which is using HA features.

Until today I was able to use the reboot button to just reboot a node, the HA would automatically migrate the VMs to another node.
But after the update today it doesn't work anymore. Yet can migrate manually and if I use manual maintenance mode on that node, the migration also works fine.

Is this a bug again?
Below my versions and the error message:

task started by HA resource agent
2025-12-16 11:14:27 conntrack state migration not supported or disabled, active connections might get dropped
2025-12-16 11:14:27 ERROR: migration aborted (duration 00:00:00): org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
TASK ERROR: migration aborted
------------------------------------------------------------------------------------
root@pve3:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.4-1-pve)
pve-manager: 9.1.2 (running version: 9.1.2/9d436f37a0ac4172)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-1-pve-signed: 6.17.4-1
proxmox-kernel-6.17: 6.17.4-1
proxmox-kernel-6.17.2-2-pve-signed: 6.17.2-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.1
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.4
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.0-1
proxmox-backup-file-restore: 4.1.0-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.2
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

Those versions are the same on all 3 nodes
 
Last edited:
Until today I was able to use the reboot button to just reboot a node, the HA would automatically migrate the VMs to another node.

Please check if Datacenter --> Options --> HA Settings contains "Shutdown Policy: migrate"
 
Please check if Datacenter --> Options --> HA Settings contains "Shutdown Policy: migrate"
Yes it is set to migrate. The default at the moment leads to a shutdown of the vm.
The migration is the behaviour I want to have.


The node is TRYING to migrate the vm, but can't (error message in first post) and then it's stuck in this state. VM won't migrate and the node won't reboot.
 
Last edited:
I'm also experiencing the same.

Maintenance mode works fine, and VMs migrate without issue.
Manual migration also works fine and VMs migrate without issue.
But if I try and reboot with it triggering VM migration the migrations fail.


My cluster began as a 8.something cluster
I upgraded to 9.1 upon its release.
Today I did an update of all my cluster nodes to 9.1.2 via an apt update && apt upgrade

After each node completed its package updates I attempted a reboot to use the new kernel.

Every node in the cluster failed with a similar problem on migration. They always failed always failed with some combination of:
task started by HA resource agent
2025-12-16 11:20:26 conntrack state migration not supported or disabled, active connections might get dropped
2025-12-16 11:20:26 ERROR: migration aborted (duration 00:00:00): org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
TASK ERROR: migration aborted

Sometimes only the conntrack state migration error, sometimes both the conntrack and dbus error.

Watching live it appears to me as though systemd is shutting down a bunch of supporting services while the machine is in the process of shutting down and migrating nodes.

I can see in the system tab that things like the promox-firewall and other services go from running to dead well before migration even begins.
This appears almost immediately after I click the reboot button.

Example:

1765904659052.png

It's worth noting that these all end up dead before migrations begin.
Once it's in this state I cannot even reboot the system from a root shell using `shutdown` or `poweroff` since they fail with the error:
root@node2:~# reboot
Failed to set wall message, ignoring: Transport endpoint is not connected
Call to Reboot failed: Transport endpoint is not connected

It's almost like the system is shutting down crucial services too early.

I can work around this issue by forcing the system into maintenance mode BEFORE rebooting, but that is a pain because it requires a root shell rather than just using my cluster admin user.
 
Last edited:
I'm also experiencing the same.

Maintenance mode works fine, and VMs migrate without issue.
Manual migration also works fine and VMs migrate without issue.
But if I try and reboot with it triggering VM migration the migrations fail.


My cluster began as a 8.something cluster
I upgraded to 9.1 upon its release.
Today I did an update of all my cluster nodes to 9.1.2 via an apt update && apt upgrade

After each node completed its package updates I attempted a reboot to use the new kernel.

Every node in the cluster failed with a similar problem on migration. They always failed always failed with some combination of:


Sometimes only the conntrack state migration error, sometimes both the conntrack and dbus error.

Watching live it appears to me as though systemd is shutting down a bunch of supporting services while the machine is in the process of shutting down and migrating nodes.

I can see in the system tab that things like the promox-firewall and other services go from running to dead well before migration even begins.
This appears almost immediately after I click the reboot button.

Example:

View attachment 94098

It's worth noting that these all end up dead before migrations begin.
Once it's in this state I cannot even reboot the system from a root shell using `shutdown` or `poweroff` since they fail with the error:


It's almost like the system is shutting down crucial services too early.

I can work around this issue by forcing the system into maintenance mode BEFORE rebooting, but that is a pain because it requires a root shell rather than just using my cluster admin user.
Yeah I was able to reproduce that. It definitely seems to be a bug in one of the updated packages of todays updates.
 
  • Like
Reactions: SteveITS
+1 also having an issue today with auto-migration after the update

Nodes had a reboot scheduled (probably an autoupgrades patch), they tried to migrate for about 30 minutes, repeatedly fails with:

Code:
task started by HA resource agent
2025-12-16 17:24:35 conntrack state migration not supported or disabled, active connections might get dropped
2025-12-16 17:24:35 ERROR: migration aborted (duration 00:00:00): org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
TASK ERROR: migration aborted

Eventually I even saw a node get fenced off

Then reboot happens and everything comes back online.

On another node I caught it in the middle of this process:

Code:
# reboot
Failed to set wall message, ignoring: Transport endpoint is not connected
Call to Reboot failed: Transport endpoint is not connected
root@cream:~# systemctl --force --force reboot
Rebooting.
  • I can migrate a vm normally between nodes OK
  • I have not tried doing a full node migration yet.
  • Shutdown policy is set to migrate.
Dunno if any of that helps for finding the bug
 
qemu-server 9.1.3 available on pve-test now should fix this issue. affected VMs do need to be stopped and started (or live-migrated *outside of a node reboot/shutdown!) for the fix to take affect.