Livemigration error with localdisks and pve 4.4-96

udo · Oct 2, 2017

Hi,
I reinstalled an clusternode today and the livemigration with local-storage (zfs + ext4-directory) fails with following error message (block number differ):

Code:

Oct 02 14:18:29 ERROR: VM 412 qmp command 'cont' failed - Conflicts with use by a block device as 'root', which uses 'write' on #block163

After that I can resume the VM from the gui.

Looks similiar to this: https://forum.proxmox.com/threads/live-migration-with-local-storage-failing.34203/

Nothing in syslog and in messages only this:

Code:

Oct  2 14:14:22 pve99 qm[21977]: <root@pam> starting task UPID:pve99:000055DB:00079D5C:59D22D9E:qmstart:412:root@pam:
Oct  2 14:14:22 pve99 kernel: [ 4991.052108] device tap412i0 entered promiscuous mode
Oct  2 14:14:23 pve99 qm[21977]: <root@pam> end task UPID:pve99:000055DB:00079D5C:59D22D9E:qmstart:412:root@pam: OK
Oct  2 14:18:29 pve99 qm[22867]: <root@pam> starting task UPID:pve99:00005954:0007FDDC:59D22E95:qmresume:412:root@pam:
Oct  2 14:18:43 pve99 pvedaemon[4090]: <udo@com> starting task UPID:pve99:00005977:0008036E:59D22EA3:qmresume:412:udo@com:
Oct  2 14:18:43 pve99 pvedaemon[4090]: <udo@com> end task UPID:pve99:00005977:0008036E:59D22EA3:qmresume:412:udo@com: OK

The migration:

Code:

root@pve04:~# qm migrate 412 pve99 --online --with-local-disks
Oct 02 14:14:21 starting migration of VM 412 to node 'pve99' (10.1.2.29)
Oct 02 14:14:21 found local disk 'local_vm_storage:412/vm-412-disk-1.qcow2' (in current VM config)
Oct 02 14:14:21 copying disk images
Oct 02 14:14:21 starting VM 412 on remote node 'pve99'
Oct 02 14:14:23 starting storage migration
Oct 02 14:14:23 scsi0: start migration to to nbd:10.1.2.29:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 26214400000 bytes total: 26214400000 bytes progression: 0.00 % busy: true ready: false
drive-scsi0: transferred: 117440512 bytes remaining: 26096959488 bytes total: 26214400000 bytes progression: 0.45 % busy: true ready: false
drive-scsi0: transferred: 234881024 bytes remaining: 25979518976 bytes total: 26214400000 bytes progression: 0.90 % busy: true ready: false
drive-scsi0: transferred: 352321536 bytes remaining: 25862078464 bytes total: 26214400000 bytes progression: 1.34 % busy: true ready: false
drive-scsi0: transferred: 470810624 bytes remaining: 25743589376 bytes total: 26214400000 bytes progression: 1.80 % busy: true ready: false
drive-scsi0: transferred: 588251136 bytes remaining: 25626148864 bytes total: 26214400000 bytes progression: 2.24 % busy: true ready: false
drive-scsi0: transferred: 705691648 bytes remaining: 25508708352 bytes total: 26214400000 bytes progression: 2.69 % busy: true ready: false
drive-scsi0: transferred: 823132160 bytes remaining: 25391267840 bytes total: 26214400000 bytes progression: 3.14 % busy: true ready: false
...
drive-scsi0: transferred: 25870467072 bytes remaining: 347275264 bytes total: 26217742336 bytes progression: 98.68 % busy: true ready: false
drive-scsi0: transferred: 25987907584 bytes remaining: 229834752 bytes total: 26217742336 bytes progression: 99.12 % busy: true ready: false
drive-scsi0: transferred: 26105348096 bytes remaining: 112394240 bytes total: 26217742336 bytes progression: 99.57 % busy: true ready: false
drive-scsi0: transferred: 26217742336 bytes remaining: 0 bytes total: 26217742336 bytes progression: 100.00 % busy: false ready: true
all mirroring jobs are ready
Oct 02 14:18:08 starting online/live migration on tcp:10.1.2.29:60000
Oct 02 14:18:08 migrate_set_speed: 8589934592
Oct 02 14:18:08 migrate_set_downtime: 0.1
Oct 02 14:18:08 set migration_caps
Oct 02 14:18:08 set cachesize: 107374182
Oct 02 14:18:08 start migrate command to tcp:10.1.2.29:60000
Oct 02 14:18:10 migration status: active (transferred 232502293, remaining 838762496), total 1082990592)
Oct 02 14:18:10 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 02 14:18:12 migration status: active (transferred 467800065, remaining 598904832), total 1082990592)
Oct 02 14:18:12 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 02 14:18:14 migration status: active (transferred 702496620, remaining 362430464), total 1082990592)
Oct 02 14:18:14 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 02 14:18:16 migration status: active (transferred 937436644, remaining 119513088), total 1082990592)
Oct 02 14:18:16 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 02 14:18:16 migration status: active (transferred 972816980, remaining 80605184), total 1082990592)
Oct 02 14:18:16 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 02 14:18:17 migration status: active (transferred 1007974017, remaining 44548096), total 1082990592)
Oct 02 14:18:17 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 02 14:18:17 migration status: active (transferred 1044225300, remaining 38567936), total 1082990592)
Oct 02 14:18:17 migration xbzrle cachesize: 67108864 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 02 14:18:17 migration speed: 4.38 MB/s - downtime 66 ms
Oct 02 14:18:17 migration status: completed
drive-scsi0: transferred: 26218135552 bytes remaining: 0 bytes total: 26218135552 bytes progression: 100.00 % busy: false ready: true
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.


drive-scsi0 : finished
Oct 02 14:18:29 ERROR: VM 412 qmp command 'cont' failed - Conflicts with use by a block device as 'root', which uses 'write' on #block163
Oct 02 14:18:29 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@10.1.2.29 qm resume 412 --skiplock --nocheck' failed: exit code 255
Oct 02 14:18:31 ERROR: migration finished with problems (duration 00:04:10)
migration problems
root@pve04:~#

The version of the new node:

Code:

pveversion -v
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-112
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
openvswitch-switch: 2.6.0-2
ceph: 10.2.9-1~bpo80+1

The VM-config is simple:

Code:

boot: c
bootdisk: scsi0
cores: 1
keyboard: de
memory: 1024
name: dns
net0: virtio=2E:8A:19:C1:4D:06,bridge=vmbr0,tag=1234
numa: 1
onboot: 1
ostype: l26
scsi0: local_vm_storage:412/vm-412-disk-1.qcow2,format=qcow2,size=25000M
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=d77283f9-2c9c-41cc-8644-dd01c4762bef
sockets: 1

Udo

aderumier · Oct 3, 2017

do you have
qemu-server: 4.0-112

on both nodes ?

mainly for this fix:

https://git.proxmox.com/?p=qemu-server.git;a=commit;h=b2e4d3982fe7f6a413852c6ef97814907dbb8fea

fabian · Oct 3, 2017

we missed some cherry-picks for 2.9, updated package is in the works.

udo · Oct 3, 2017

aderumier said:
do you have
qemu-server: 4.0-112

on both nodes ?

mainly for this fix:

https://git.proxmox.com/?p=qemu-server.git;a=commit;h=b2e4d3982fe7f6a413852c6ef97814907dbb8fea

Hi,
yes on the source node is qemu-server: 4.0-112 intalled also - but the VMs perhaps started with an older version (one of the problem-VMs has an uptime of 35 days).

The git commit mean it's not an bug, but an feature?
For some machines it's ok to resume after migration is finished - for some it's an nogo, because the VMs paused and the clock miss the time, where the VM paused.

Udo

fabian · Oct 3, 2017

no, that commit was for a different resume bug

VMs should resume automatically after live-migration, but for live-migration with local disks we missed some commits from PVE 5.

fabian · Oct 4, 2017

qemu-server 4.0-113 with cherry-picks is available on pvetest.

udo · Oct 4, 2017

fabian said:
qemu-server 4.0-113 with cherry-picks is available on pvetest.

Hi Fabian,
I installed this version on the destination node (without reboot) and do an migration test again, but it fails like before:

Code:

Oct 04 12:23:13 ERROR: VM 300 qmp command 'cont' failed - Conflicts with use by a block device as 'root', which uses 'write' on #block130
Oct 04 12:23:13 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@10.1.2.29 qm resume 300 --skiplock --nocheck' failed: exit code 255
Oct 04 12:23:15 ERROR: migration finished with problems (duration 00:04:16)
migration problems

pveversions on the destination node:

Code:

root@pve99:~# pveversion -v
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-113
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96                                                                                                                                                         
libpve-access-control: 4.0-23                                                                                                                                                     
libpve-storage-perl: 4.0-76                                                                                                                                                       
pve-libspice-server1: 0.12.8-2                                                                                                                                                     
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
openvswitch-switch: 2.6.0-2
ceph: 10.2.9-1~bpo80+1

Udo

fabian · Oct 4, 2017

the change is on the source side, so you'd need to upgrade the source node (no need to restart the guest, it's just a change in PVE's logic). sorry for not mentioning this.

udo · Oct 4, 2017

Hi Fabian,
with qemu-server_4.0-113 on the source node the migration works again!

Thanks.

Udo

Search

Search

Livemigration error with localdisks and pve 4.4-96

udo

Distinguished Member

aderumier

Renowned Member

fabian

Proxmox Staff Member

udo

Distinguished Member

fabian

Proxmox Staff Member

fabian

Proxmox Staff Member

udo

Distinguished Member

fabian

Proxmox Staff Member

udo

Distinguished Member

We value your privacy