Migration failures I don't understand

LordRatner · Jan 8, 2023

Hi. I'm guessing this is user error, but I still don't understand how to fix it.

I have HA set up and it functions as expected. However when a node comes back online and PVE tries to migrate back tot the higher priority node, I get failures on the VMs. Actually I think all of my VMs fail to migrate if they are running. here are a couple examples:

Nextcloud VM:

Code:

task started by HA resource agent
2023-01-07 17:41:16 starting migration of VM 112 to node 'node2' (192.168.10.21)
2023-01-07 17:41:16 found local, replicated disk 'local-zfs:vm-112-disk-0' (in current VM config)
2023-01-07 17:41:16 found local, replicated disk 'local-zfs:vm-112-state-pre_install' (referenced by snapshot(s))
2023-01-07 17:41:16 can't migrate local disk 'local-zfs:vm-112-disk-0': online storage migration not possible if snapshot exists
2023-01-07 17:41:16 ERROR: Problem found while scanning volumes - can't migrate VM - check log
2023-01-07 17:41:16 aborting phase 1 - cleanup resources
2023-01-07 17:41:16 ERROR: migration aborted (duration 00:00:00): Problem found while scanning volumes - can't migrate VM - check log
TASK ERROR: migration aborted

And LTSP VM:

Code:

task started by HA resource agent
2023-01-07 17:43:46 starting migration of VM 120 to node 'node2' (192.168.10.21)
2023-01-07 17:43:46 found local disk 'local-zfs:vm-120-disk-0' (in current VM config)
2023-01-07 17:43:46 found local disk 'local-zfs:vm-120-state-Amd' (referenced by snapshot(s))
2023-01-07 17:43:46 found local disk 'local-zfs:vm-120-state-vdi_trouble' (referenced by snapshot(s))
2023-01-07 17:43:46 copying local disk images
Use of uninitialized value $target_storeid in string eq at /usr/share/perl5/PVE/Storage.pm line 778.
Use of uninitialized value $targetsid in concatenation (.) or string at /usr/share/perl5/PVE/QemuMigrate.pm line 678.
2023-01-07 17:43:46 ERROR: storage migration for 'local-zfs:vm-120-disk-0' to storage '' failed - no storage ID specified
2023-01-07 17:43:46 aborting phase 1 - cleanup resources
2023-01-07 17:43:46 ERROR: migration aborted (duration 00:00:00): storage migration for 'local-zfs:vm-120-disk-0' to storage '' failed - no storage ID specified
TASK ERROR: migration aborted

Now, with the first VM (nextcloud), using the GUI shutdown button does nothing (a promblem I have with several VMs). However if I go into the console and "shutdown now," the VM shuts down then immediately the migration succeeds.

The LTSP VM is different. When HA tries to start the VM on the secondary node (after the primary is shut down), it fails to start. Then when the primary node comes online, it fails to migrate back. I can't do anything with it, in fact, unless I just destroy is and restore from a backup.

Ideas?
Thanks,
Seth

Neobin · Jan 8, 2023

For your first case, there is already a feature request open: [1].
I have no clue/idea about/for your second case. Maybe try there too without snapshots and if the error message changes, provide it here.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=2792

Search

Search

Migration failures I don't understand

LordRatner

Member

Neobin

Distinguished Member