Whats this error while migrating a VM?

Jun 1, 2023
30
12
3
We use the latest and greatest proxmox version 8.x in a 3 node cluster. All VMs are using zfs volumes for their disks and the volumes are replicated to every node of the cluster.

While migrating a vm from node 2 to node 3 of the cluster the following shows up in the syslog of node 2:

2024-11-05T11:32:53.132057+01:00 devmox2 pve-ha-crm[1622]: got crm command: migrate vm:119 devmox3
2024-11-05T11:32:53.132248+01:00 devmox2 pve-ha-crm[1622]: migrate service 'vm:119' to node 'devmox3'
2024-11-05T11:32:53.132402+01:00 devmox2 pve-ha-crm[1622]: service 'vm:119': state changed from 'started' to 'migrate' (node = devmox2, target = devmox3)
2024-11-05T11:33:00.785824+01:00 devmox2 pve-ha-lrm[74157]: <root@pam> starting task UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:
2024-11-05T11:33:04.982222+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T11:33:05.650625+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T11:33:05.791871+01:00 devmox2 pve-ha-lrm[74157]: Task 'UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:' still active, waiting
2024-11-05T11:33:08.648264+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_efidisk0' is currently in use by another operation and cannot be used
2024-11-05T11:33:08.648430+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_scsi0' is currently in use by another operation and cannot be used
2024-11-05T11:33:10.793501+01:00 devmox2 pve-ha-lrm[74157]: Task 'UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:' still active, waiting
2024-11-05T11:33:11.882165+01:00 devmox2 kernel: [ 4772.598367] zd512: p1 p14 p15
2024-11-05T11:33:15.795538+01:00 devmox2 pve-ha-lrm[74157]: Task 'UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:' still active, waiting
2024-11-05T11:33:16.872309+01:00 devmox2 kernel: [ 4777.588533] tap119i0: left allmulticast mode
2024-11-05T11:33:16.872320+01:00 devmox2 kernel: [ 4777.588553] vmbr0: port 4(tap119i0) entered disabled state
2024-11-05T11:33:16.901160+01:00 devmox2 qmeventd[1106]: read: Connection reset by peer
2024-11-05T11:33:16.942072+01:00 devmox2 systemd[1]: 119.scope: Deactivated successfully.
2024-11-05T11:33:16.942218+01:00 devmox2 systemd[1]: 119.scope: Consumed 1min 23.161s CPU time.
2024-11-05T11:33:17.628059+01:00 devmox2 pve-ha-lrm[74159]: migration problems
2024-11-05T11:33:17.637204+01:00 devmox2 pve-ha-lrm[74157]: <root@pam> end task UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam: migration problems
2024-11-05T11:33:23.159431+01:00 devmox2 pve-ha-crm[1622]: service 'vm:119': state changed from 'migrate' to 'started' (node = devmox3)
2024-11-05T11:33:26.972986+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T11:33:27.610960+01:00 devmox2 pmxcfs[1483]: [status] notice: received log

The vm is migrated to node 3 but during the migration it is stoped and restarted on node 3.

Thanks for any hint for solving the issue.
 
Hi,

2024-11-05T11:33:08.648264+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_efidisk0' is currently in use by another operation and cannot be used
2024-11-05T11:33:08.648430+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_scsi0' is currently in use by another operation and cannot be used
During the migrate the VM faced a temporary lock issue on the replication bitmaps as you see above two lines. This can occur if another replication or backup task is holding these resources during migration. So the question is; there is a replication or backup process during the migrateion?
 
During the migrate the VM faced a temporary lock issue on the replication bitmaps as you see above two lines. This can occur if another replication or backup task is holding these resources during migration. So the question is; there is a replication or backup process during the migrateion?

Thanks for this explanation. Because we replicate all disks of all vms in the cluster across all nodes I think it is another replication job which is running at the same time. Unfortunately this jobs take very long because the storage of the nodes is not the fastest :-(.

I will take another look...
 
So now I get another error:

2024-11-05T13:47:24.519763+01:00 devmox2 pve-ha-crm[1622]: got crm command: migrate vm:121 devmox3
2024-11-05T13:47:24.520308+01:00 devmox2 pve-ha-crm[1622]: migrate service 'vm:121' to node 'devmox3'
2024-11-05T13:47:24.520374+01:00 devmox2 pve-ha-crm[1622]: service 'vm:121': state changed from 'started' to 'migrate' (node = devmox2, target = devmox3)
2024-11-05T13:47:32.596745+01:00 devmox2 pve-ha-lrm[178436]: <root@pam> starting task UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:
2024-11-05T13:47:36.892197+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:47:37.580557+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:47:37.602457+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:42.603922+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:47.605209+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:52.606474+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:57.072163+01:00 devmox2 kernel: [12858.020805] zd352: p1 p14 p15
2024-11-05T13:47:57.608240+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:58.363992+01:00 devmox2 pve-ha-lrm[178437]: VM 121 qmp command failed - VM 121 qmp command 'block-job-cancel' failed - Block job 'drive-efidisk0' not found
2024-11-05T13:47:58.365380+01:00 devmox2 pve-ha-lrm[178437]: VM 121 qmp command failed - VM 121 qmp command 'block-job-cancel' failed - Block job 'drive-scsi0' not found
2024-11-05T13:47:59.162339+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:47:59.167979+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:48:00.441262+01:00 devmox2 pve-ha-lrm[178437]: migration problems
2024-11-05T13:48:00.447894+01:00 devmox2 pve-ha-lrm[178436]: <root@pam> end task UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam: migration problems
2024-11-05T13:48:00.448285+01:00 devmox2 pve-ha-lrm[178436]: service vm:121 not moved (migration error)
2024-11-05T13:48:04.551459+01:00 devmox2 pve-ha-crm[1622]: service 'vm:121' - migration failed (exit code 1)
2024-11-05T13:48:04.551584+01:00 devmox2 pve-ha-crm[1622]: service 'vm:121': state changed from 'migrate' to 'started' (node = devmox2)

A

zfs list

shows me the disks of the vm, also in the configuration of the vm everything regarding the disk configuration seems to be OK.

Has anyone an idea what can cause this problem?
 
Hi,
please share the full task log of such a failed migration (select your VM and then Task History in the UI and double click the migration task) as well as the output of qm config 121 --current.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!