VM blocked with backup or migration

Reartu24

New Member
Jul 9, 2024
10
7
3
Good afternoon everyone, for a few days now I've been having a problem with a VM that I can't migrate or back up because it gives me this error:

2025-10-15 15:43:30 starting migration of VM 3103 to node 'AH-NODO01' (10.3.1.161)
2025-10-15 15:43:30 starting VM 3103 on remote node 'AH-NODO01'
2025-10-15 15:43:32 start remote tunnel
2025-10-15 15:43:32 ssh tunnel ver 1
2025-10-15 15:43:32 starting online/live migration on unix:/run/qemu-server/3103.migrate
2025-10-15 15:43:32 set migration capabilities
VM 3103 qmp command 'migrate-set-capabilities' failed - There's a migration process in progress
2025-10-15 15:43:33 migration downtime limit: 100 ms
2025-10-15 15:43:33 migration cachesize: 4.0 GiB
2025-10-15 15:43:33 set migration parameters
2025-10-15 15:43:33 start migrate command to unix:/run/qemu-server/3103.migrate
2025-10-15 15:43:33 migrate uri => unix:/run/qemu-server/3103.migrate failed: VM 3103 qmp command 'migrate' failed - There's a migration process in progress
2025-10-15 15:43:34 ERROR: online migrate failure - VM 3103 qmp command 'migrate' failed - There's a migration process in progress
2025-10-15 15:43:34 aborting phase 2 - cleanup resources
2025-10-15 15:43:34 migrate_cancel
2025-10-15 15:43:35 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

obviously there are no migrations on this node (it is the only VM left)

I noticed that in the Ceph pool there's a disk called VM-3103-state-test, but there are no snapshots or anything else in the VM. If I try to delete that file, I get the IMAGE error.

Cannot remove image, a guest with VMID '3103' exists!
You can delete the image from the guest's hardware panel

but in the panel there is not this disk attached,
I also checked the conf file but there is no reference to this disk inside it
#Server Database CDP CDR
agent: 1
bios: ovmf
boot: order=virtio0;ide2;net0
cores: 8
efidisk0: HDD-VM:vm-3103-disk-4,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
hotplug: disk,network,usb
ide2: none,media=cdrom
machine: pc-i440fx-9.2+pve1
memory: 40960
name: CDP-CDR-DB
net0: virtio=76:74:6D:33:4F:4B,bridge=vmbr0,tag=3
numa: 0
onboot: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=75ee83c9-09a7-4332-9d86-7537743a58af
sockets: 2
tags: 10.3.3.220;win2019
vga: vmware
virtio0: HDD-VM:vm-3103-disk-0,format=raw,size=120G
virtio1: HDD-VM:vm-3103-disk-1,format=raw,size=750G
virtio2: HDD-VM:vm-3103-disk-2,format=raw,size=450G
virtio3: HDD-VM:vm-3103-disk-3,format=raw,size=200G
vmgenid: 59aec684-ac24-4f64-a996-b93db275257f

I also updated proxmox from 8.4.10 to 8.4.14, but nothing changed unfortunately
PS: i have one cluster with 5 node and CEPH storage


How can i solve the problem?
Many thanks to anyone who can help me
 
I noticed that in the Ceph pool there's a disk called VM-3103-state-test, but there are no snapshots or anything else in the VM. If I try to delete that file, I get the IMAGE erro
You should be able to run : qm disk rescan --vmid 3103
This should bring the disk into the VM hardware panel where you can delete it.

For migration issue - it sounds like the QEMU process, that previously tried the migration, failed. It left state behind that prevents new migrations.
You have a few choices:
- shutdown the VM to clear the state
- troubleshoot on your own using QEMU monitor and enhanced QEMU debugging steps
- procure a subscription and open a case with Support


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Hi bbgeek17 Thanks for your reply
before open a post i have try to rescan the HDD but not appear in the Hardware tab of the Machine ( i try with the vm started and stopped).
for troubleshoot on your own using QEMU monitor, i search the guide for understand witch command use for debugging.
I requested it for the subscription, I'm waiting for them to approve the purchase, in the meantime I was hoping that someone on the forum had this problem and could help me solve it as soon as possible, unfortunately it is a very delicate and precious VM it contains all the data of a hospital (historical and current) if it were to corrupt it would be a problem
 
for troubleshoot on your own using QEMU monitor, i search the guide for understand witch command use for debugging.
For QEMU monitor have a look at the available functions and objects in the docu. Cancelling a job is described here.

Connect to the node running the vm and connect to the QEMU monitor via socat -,raw,echo=0 UNIX-CONNECT:/run/qemu-server/3103.qmp

Enter {"execute": "qmp_capabilities"}
and then try your luck with
{ "execute": "migrate_cancel" }
 
unfortunately it is a very delicate and precious VM it contains all the data of a hospital (historical and current) if it were to corrupt it would be a problem
Hi @Reartu24,

If the data is that valuable, I strongly recommend proceeding with extreme caution.

Here’s an example of what can happen when a QEMU block is not properly released: Data lost for time window

My recommendation is to address this issue as soon as possible. An agent-based or application-level backup is highly advisable. It should work since your OS and applications are still operational.

You could reach out to a Proxmox partner, but given the value of the data, I would personally want Proxmox personnel directly involved.

Best of luck,


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Luckily, I managed to unblock the locked VM over the weekend.

Thanks to Croit's advice, the migration was unblocked, and I was able to cone the VM and restore operation.
The thing that was absurd to me, and I don't understand it, is that not even Veeam and Nakivo were able to back up this VM because of this "lock."
I definitely need to organize a replication on another host so I don't have to sweat anymore :-)

Thank a lot everyone
 
Congratulations @Reartu24 .
The thing that was absurd to me, and I don't understand it, is that not even Veeam and Nakivo were able to back up this VM because of this "lock."
Veeam, and perhaps Nakivio, use the underlying Qemu commands to manipulate block devices during the backup. If the device is already being used by another Qemu command from the same family, its not surprising that backups would fail.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S and fba