Whats this error while migrating a VM?

schoeppi · Nov 5, 2024

We use the latest and greatest proxmox version 8.x in a 3 node cluster. All VMs are using zfs volumes for their disks and the volumes are replicated to every node of the cluster.

While migrating a vm from node 2 to node 3 of the cluster the following shows up in the syslog of node 2:

2024-11-05T11:32:53.132057+01:00 devmox2 pve-ha-crm[1622]: got crm command: migrate vm:119 devmox3
2024-11-05T11:32:53.132248+01:00 devmox2 pve-ha-crm[1622]: migrate service 'vm:119' to node 'devmox3'
2024-11-05T11:32:53.132402+01:00 devmox2 pve-ha-crm[1622]: service 'vm:119': state changed from 'started' to 'migrate' (node = devmox2, target = devmox3)
2024-11-05T11:33:00.785824+01:00 devmox2 pve-ha-lrm[74157]: <root@pam> starting task UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:
2024-11-05T11:33:04.982222+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T11:33:05.650625+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T11:33:05.791871+01:00 devmox2 pve-ha-lrm[74157]: Task 'UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:' still active, waiting
2024-11-05T11:33:08.648264+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_efidisk0' is currently in use by another operation and cannot be used
2024-11-05T11:33:08.648430+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_scsi0' is currently in use by another operation and cannot be used
2024-11-05T11:33:10.793501+01:00 devmox2 pve-ha-lrm[74157]: Task 'UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:' still active, waiting
2024-11-05T11:33:11.882165+01:00 devmox2 kernel: [ 4772.598367] zd512: p1 p14 p15
2024-11-05T11:33:15.795538+01:00 devmox2 pve-ha-lrm[74157]: Task 'UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam:' still active, waiting
2024-11-05T11:33:16.872309+01:00 devmox2 kernel: [ 4777.588533] tap119i0: left allmulticast mode
2024-11-05T11:33:16.872320+01:00 devmox2 kernel: [ 4777.588553] vmbr0: port 4(tap119i0) entered disabled state
2024-11-05T11:33:16.901160+01:00 devmox2 qmeventd[1106]: read: Connection reset by peer
2024-11-05T11:33:16.942072+01:00 devmox2 systemd[1]: 119.scope: Deactivated successfully.
2024-11-05T11:33:16.942218+01:00 devmox2 systemd[1]: 119.scope: Consumed 1min 23.161s CPU time.
2024-11-05T11:33:17.628059+01:00 devmox2 pve-ha-lrm[74159]: migration problems
2024-11-05T11:33:17.637204+01:00 devmox2 pve-ha-lrm[74157]: <root@pam> end task UPID:devmox2:000121AF:00074429:6729F45C:qmigrate:119:root@pam: migration problems
2024-11-05T11:33:23.159431+01:00 devmox2 pve-ha-crm[1622]: service 'vm:119': state changed from 'migrate' to 'started' (node = devmox3)
2024-11-05T11:33:26.972986+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T11:33:27.610960+01:00 devmox2 pmxcfs[1483]: [status] notice: received log

The vm is migrated to node 3 but during the migration it is stoped and restarted on node 3.

Thanks for any hint for solving the issue.

Moayad · Nov 5, 2024

Hi,

schoeppi said:
2024-11-05T11:33:08.648264+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_efidisk0' is currently in use by another operation and cannot be used
2024-11-05T11:33:08.648430+01:00 devmox2 QEMU[10140]: kvm: Bitmap 'repl_scsi0' is currently in use by another operation and cannot be used

During the migrate the VM faced a temporary lock issue on the replication bitmaps as you see above two lines. This can occur if another replication or backup task is holding these resources during migration. So the question is; there is a replication or backup process during the migrateion?

schoeppi · Nov 5, 2024

Moayad said:
During the migrate the VM faced a temporary lock issue on the replication bitmaps as you see above two lines. This can occur if another replication or backup task is holding these resources during migration. So the question is; there is a replication or backup process during the migrateion?

Thanks for this explanation. Because we replicate all disks of all vms in the cluster across all nodes I think it is another replication job which is running at the same time. Unfortunately this jobs take very long because the storage of the nodes is not the fastest :-(.

I will take another look...

schoeppi · Nov 5, 2024

So now I get another error:

2024-11-05T13:47:24.519763+01:00 devmox2 pve-ha-crm[1622]: got crm command: migrate vm:121 devmox3
2024-11-05T13:47:24.520308+01:00 devmox2 pve-ha-crm[1622]: migrate service 'vm:121' to node 'devmox3'
2024-11-05T13:47:24.520374+01:00 devmox2 pve-ha-crm[1622]: service 'vm:121': state changed from 'started' to 'migrate' (node = devmox2, target = devmox3)
2024-11-05T13:47:32.596745+01:00 devmox2 pve-ha-lrm[178436]: <root@pam> starting task UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:
2024-11-05T13:47:36.892197+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:47:37.580557+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:47:37.602457+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:42.603922+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:47.605209+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:52.606474+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:57.072163+01:00 devmox2 kernel: [12858.020805] zd352: p1 p14 p15
2024-11-05T13:47:57.608240+01:00 devmox2 pve-ha-lrm[178436]: Task 'UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam:' still active, waiting
2024-11-05T13:47:58.363992+01:00 devmox2 pve-ha-lrm[178437]: VM 121 qmp command failed - VM 121 qmp command 'block-job-cancel' failed - Block job 'drive-efidisk0' not found
2024-11-05T13:47:58.365380+01:00 devmox2 pve-ha-lrm[178437]: VM 121 qmp command failed - VM 121 qmp command 'block-job-cancel' failed - Block job 'drive-scsi0' not found
2024-11-05T13:47:59.162339+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:47:59.167979+01:00 devmox2 pmxcfs[1483]: [status] notice: received log
2024-11-05T13:48:00.441262+01:00 devmox2 pve-ha-lrm[178437]: migration problems
2024-11-05T13:48:00.447894+01:00 devmox2 pve-ha-lrm[178436]: <root@pam> end task UPID:devmox2:0002B905:00139536:672A13E4:qmigrate:121:root@pam: migration problems
2024-11-05T13:48:00.448285+01:00 devmox2 pve-ha-lrm[178436]: service vm:121 not moved (migration error)
2024-11-05T13:48:04.551459+01:00 devmox2 pve-ha-crm[1622]: service 'vm:121' - migration failed (exit code 1)
2024-11-05T13:48:04.551584+01:00 devmox2 pve-ha-crm[1622]: service 'vm:121': state changed from 'migrate' to 'started' (node = devmox2)

A

zfs list

shows me the disks of the vm, also in the configuration of the vm everything regarding the disk configuration seems to be OK.

Has anyone an idea what can cause this problem?

fiona · Nov 6, 2024

Hi,
please share the full task log of such a failed migration (select your VM and then Task History in the UI and double click the migration task) as well as the output of qm config 121 --current.

Search

Search

Whats this error while migrating a VM?

schoeppi

Member

Moayad

Proxmox Staff Member

schoeppi

Member

schoeppi

Member

fiona

Proxmox Staff Member

We value your privacy