VM Migration Failure

redflag_420 · Sep 17, 2024

Hello, I upgraded from Proxmox 7.4 to 8.2.4. After the upgrade, when I migrate powered on VMs between servers, it won't power on/resume after the migration. Otherwise, the migration completes. I can't seem to find anything in any log that I'm aware of that'll help narrow down the issue. The VM will power on perfectly fine after the migration is complete, but I have to manually power it on. If I power off the VM first, then migration, there is no issues at all. This is what the log shows...

2024-09-17 03:17:49 migration status: completed
all 'mirror' jobs are ready
drive-efidisk0: Completing block job...
drive-efidisk0: Completed successfully.
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-efidisk0: mirror-job finished
drive-scsi0: mirror-job finished
2024-09-17 03:17:51 stopping NBD storage migration server on target.
2024-09-17 03:17:53 ERROR: tunnel replied 'ERR: resume failed - VM 119 not running' to command 'resume 119'
Logical volume "vm-119-disk-0" successfully removed.
Logical volume "vm-119-disk-1" successfully removed.
2024-09-17 03:18:23 ERROR: migration finished with problems (duration 06:36:13)
TASK ERROR: migration problems

Any idea what it could be? Any help is greatly appreciated.

Sakreton · Sep 21, 2024

I have the exact same problem, but unless OP i did not recently upgrade but have this problem all the time when i try to live migrate a VM to another host.

If this is relevant: This VM has 2 Volumes, one the boot volume stored on local-zfs and another stored on a NFS share mapped into proxmox and then mapped as another drive into the VM.

2024-09-21 22:06:25 starting migration of VM 119 to node 'Luxtra' (mailto:root@ADDRESS)
2024-09-21 22:06:25 found local, replicated disk 'local-zfs:vm-119-disk-0' (attached)
2024-09-21 22:06:25 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2024-09-21 22:06:25 replicating disk images
2024-09-21 22:06:25 start replication job
2024-09-21 22:06:25 guest => VM 119, running => 679885
2024-09-21 22:06:25 volumes => local-zfs:vm-119-disk-0
2024-09-21 22:06:26 freeze guest filesystem
2024-09-21 22:06:26 create snapshot '__replicate_119-0_1726949185__' on local-zfs:vm-119-disk-0
2024-09-21 22:06:26 thaw guest filesystem
2024-09-21 22:06:26 using secure transmission, rate limit: none
2024-09-21 22:06:26 incremental sync 'local-zfs:vm-119-disk-0' (__replicate_119-0_1726948811__ => __replicate_119-0_1726949185__)
2024-09-21 22:06:27 send from @__replicate_119-0_1726948811__ to rpool/data/vm-119-disk-0@__replicate_119-0_1726949185__ estimated size is 42.0M
2024-09-21 22:06:27 total estimated size is 42.0M
2024-09-21 22:06:27 TIME SENT SNAPSHOT rpool/data/vm-119-disk-0@__replicate_119-0_1726949185__
2024-09-21 22:06:27 successfully imported 'local-zfs:vm-119-disk-0'
2024-09-21 22:06:27 delete previous replication snapshot '__replicate_119-0_1726948811__' on local-zfs:vm-119-disk-0
2024-09-21 22:06:28 (remote_finalize_local_job) delete stale replication snapshot '__replicate_119-0_1726948811__' on local-zfs:vm-119-disk-0
2024-09-21 22:06:28 end replication job
2024-09-21 22:06:28 starting VM 119 on remote node 'Luxtra'
2024-09-21 22:06:29 volume 'local-zfs:vm-119-disk-0' is 'local-zfs:vm-119-disk-0' on the target
2024-09-21 22:06:30 start remote tunnel
2024-09-21 22:06:30 ssh tunnel ver 1
2024-09-21 22:06:30 starting storage migration
2024-09-21 22:06:30 scsi0: start migration to nbd:unix:/run/qemu-server/119_nbd.migrate:exportname=drive-scsi0
drive mirror re-using dirty bitmap 'repl_scsi0'
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 384.0 KiB of 1.8 MiB (20.69%) in 0s
drive-scsi0: transferred 1.8 MiB of 1.8 MiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
2024-09-21 22:06:31 switching mirror jobs to actively synced mode
drive-scsi0: switching to actively synced mode
drive-scsi0: successfully switched to actively synced mode
2024-09-21 22:06:32 starting online/live migration on unix:/run/qemu-server/119.migrate
2024-09-21 22:06:32 set migration capabilities
2024-09-21 22:06:32 migration downtime limit: 100 ms
2024-09-21 22:06:32 migration cachesize: 1.0 GiB
2024-09-21 22:06:32 set migration parameters
2024-09-21 22:06:32 start migrate command to unix:/run/qemu-server/119.migrate
2024-09-21 22:06:33 migration active, transferred 244.8 MiB of 8.0 GiB VM-state, 11.1 GiB/s
2024-09-21 22:06:34 migration active, transferred 524.0 MiB of 8.0 GiB VM-state, 376.3 MiB/s
2024-09-21 22:06:35 migration active, transferred 799.3 MiB of 8.0 GiB VM-state, 288.5 MiB/s
2024-09-21 22:06:36 migration active, transferred 1.1 GiB of 8.0 GiB VM-state, 292.5 MiB/s
2024-09-21 22:06:37 migration active, transferred 1.3 GiB of 8.0 GiB VM-state, 283.4 MiB/s
2024-09-21 22:06:38 migration active, transferred 1.6 GiB of 8.0 GiB VM-state, 300.1 MiB/s
2024-09-21 22:06:39 migration active, transferred 1.9 GiB of 8.0 GiB VM-state, 295.7 MiB/s
2024-09-21 22:06:40 migration active, transferred 2.1 GiB of 8.0 GiB VM-state, 290.5 MiB/s
2024-09-21 22:06:41 migration active, transferred 2.4 GiB of 8.0 GiB VM-state, 297.2 MiB/s
2024-09-21 22:06:42 migration active, transferred 2.7 GiB of 8.0 GiB VM-state, 295.3 MiB/s
2024-09-21 22:06:43 migration active, transferred 3.0 GiB of 8.0 GiB VM-state, 283.4 MiB/s
2024-09-21 22:06:44 migration active, transferred 3.2 GiB of 8.0 GiB VM-state, 288.2 MiB/s
2024-09-21 22:06:45 migration active, transferred 3.5 GiB of 8.0 GiB VM-state, 285.1 MiB/s
2024-09-21 22:06:46 migration active, transferred 3.8 GiB of 8.0 GiB VM-state, 290.5 MiB/s
2024-09-21 22:06:47 migration active, transferred 4.0 GiB of 8.0 GiB VM-state, 297.2 MiB/s
2024-09-21 22:06:48 migration active, transferred 4.3 GiB of 8.0 GiB VM-state, 280.7 MiB/s
2024-09-21 22:06:49 migration active, transferred 4.6 GiB of 8.0 GiB VM-state, 278.6 MiB/s
2024-09-21 22:06:50 migration active, transferred 4.9 GiB of 8.0 GiB VM-state, 276.3 MiB/s
2024-09-21 22:06:51 average migration speed: 432.0 MiB/s - downtime 119 ms
2024-09-21 22:06:51 migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
2024-09-21 22:06:53 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=Luxtra' -o 'UserKnownHostsFile=/etc/pve/nodes/Luxtra/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@ADDRESS pvesr set-state 119 \''{"local/xerneas":{"fail_count":0,"duration":2.806288,"last_node":"xerneas","last_iteration":1726949185,"last_try":1726949185,"last_sync":1726949185,"storeid_list":["local-zfs"]}}'\'
2024-09-21 22:06:54 stopping NBD storage migration server on target.
2024-09-21 22:06:54 ERROR: tunnel replied 'ERR: resume failed - VM 119 not running' to command 'resume 119'
2024-09-21 22:06:57 ERROR: migration finished with problems (duration 00:00:32)
TASK ERROR: migration problems

Sakreton · Sep 21, 2024

Okay, i think i found the issue, both VMs were Set to CPU Type "host". Which tbh makes total sense to fail since hot swapping CPUs is not really a thing xD

Search

Search

VM Migration Failure

redflag_420

New Member

Sakreton

New Member

Sakreton

New Member

We value your privacy