[SOLVED] Pve 7, live migration errors, but everything works, should I be worried?

Nov 17, 2019
27
0
21
Just migrated 5 VMs like this, every single of them claims the migration failed, yet all of them work perfectly fine on the new host and nothing is missing. Both nodes run latest fully updated pve 7

log from one such "failed" migration:
Code:
2021-08-04 13:59:02 starting migration of VM 666 to node 'pve2' (192.168.1.165)
2021-08-04 13:59:02 found local disk 'vm-storage:666/vm-666-disk-0.raw' (in current VM config)
2021-08-04 13:59:02 found local disk 'vm-storage:666/vm-666-disk-1.raw' (in current VM config)
2021-08-04 13:59:02 starting VM 666 on remote node 'pve2'
2021-08-04 13:59:04 volume 'vm-storage:666/vm-666-disk-1.raw' is 'local:666/vm-666-disk-0.raw' on the target
2021-08-04 13:59:04 volume 'vm-storage:666/vm-666-disk-0.raw' is 'local:666/vm-666-disk-1.raw' on the target
2021-08-04 13:59:04 start remote tunnel
2021-08-04 13:59:04 ssh tunnel ver 1
2021-08-04 13:59:04 starting storage migration
2021-08-04 13:59:04 scsi0: start migration to nbd:unix:/run/qemu-server/666_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 721.9 MiB of 100.0 GiB (0.71%) in 1s
drive-scsi0: transferred 835.0 MiB of 100.0 GiB (0.82%) in 2s
drive-scsi0: transferred 952.1 MiB of 100.0 GiB (0.93%) in 3s
drive-scsi0: transferred 1.0 GiB of 100.0 GiB (1.04%) in 4s
drive-scsi0: transferred 1.2 GiB of 100.0 GiB (1.16%) in 5s
drive-scsi0: transferred 1.4 GiB of 100.0 GiB (1.35%) in 6s
drive-scsi0: transferred 1.5 GiB of 100.0 GiB (1.55%) in 7s
drive-scsi0: transferred 1.7 GiB of 100.0 GiB (1.75%) in 8s
drive-scsi0: transferred 1.9 GiB of 100.0 GiB (1.94%) in 9s
drive-scsi0: transferred 2.1 GiB of 100.0 GiB (2.13%) in 10s
drive-scsi0: transferred 2.3 GiB of 100.0 GiB (2.32%) in 11s
drive-scsi0: transferred 2.7 GiB of 100.0 GiB (2.67%) in 12s
drive-scsi0: transferred 2.8 GiB of 100.0 GiB (2.83%) in 13s
drive-scsi0: transferred 3.4 GiB of 100.0 GiB (3.42%) in 14s
drive-scsi0: transferred 3.6 GiB of 100.0 GiB (3.55%) in 15s
drive-scsi0: transferred 3.7 GiB of 100.0 GiB (3.67%) in 16s
drive-scsi0: transferred 3.8 GiB of 100.0 GiB (3.79%) in 17s
drive-scsi0: transferred 3.9 GiB of 100.0 GiB (3.87%) in 18s
drive-scsi0: transferred 4.0 GiB of 100.0 GiB (3.99%) in 19s
drive-scsi0: transferred 4.1 GiB of 100.0 GiB (4.10%) in 20s
drive-scsi0: transferred 4.2 GiB of 100.0 GiB (4.21%) in 21s
drive-scsi0: transferred 4.3 GiB of 100.0 GiB (4.34%) in 22s
drive-scsi0: transferred 4.7 GiB of 100.0 GiB (4.69%) in 23s
drive-scsi0: transferred 4.8 GiB of 100.0 GiB (4.81%) in 24s
drive-scsi0: transferred 5.0 GiB of 100.0 GiB (5.03%) in 25s
drive-scsi0: transferred 5.2 GiB of 100.0 GiB (5.24%) in 26s
drive-scsi0: transferred 5.9 GiB of 100.0 GiB (5.92%) in 27s
drive-scsi0: transferred 6.7 GiB of 100.0 GiB (6.72%) in 28s
drive-scsi0: transferred 7.3 GiB of 100.0 GiB (7.25%) in 29s
drive-scsi0: transferred 7.5 GiB of 100.0 GiB (7.51%) in 30s
drive-scsi0: transferred 7.8 GiB of 100.0 GiB (7.79%) in 31s
drive-scsi0: transferred 8.1 GiB of 100.0 GiB (8.09%) in 32s
drive-scsi0: transferred 8.3 GiB of 100.0 GiB (8.29%) in 33s
drive-scsi0: transferred 8.4 GiB of 100.0 GiB (8.43%) in 34s
drive-scsi0: transferred 8.8 GiB of 100.0 GiB (8.76%) in 35s
drive-scsi0: transferred 10.2 GiB of 100.0 GiB (10.23%) in 36s
drive-scsi0: transferred 10.5 GiB of 100.0 GiB (10.48%) in 37s
drive-scsi0: transferred 10.8 GiB of 100.0 GiB (10.81%) in 38s
drive-scsi0: transferred 10.9 GiB of 100.0 GiB (10.93%) in 39s
drive-scsi0: transferred 11.1 GiB of 100.0 GiB (11.10%) in 40s
drive-scsi0: transferred 13.1 GiB of 100.0 GiB (13.15%) in 41s
drive-scsi0: transferred 13.3 GiB of 100.0 GiB (13.29%) in 42s
drive-scsi0: transferred 14.4 GiB of 100.0 GiB (14.36%) in 43s
drive-scsi0: transferred 14.7 GiB of 100.0 GiB (14.69%) in 44s
drive-scsi0: transferred 15.3 GiB of 100.0 GiB (15.33%) in 45s
drive-scsi0: transferred 15.6 GiB of 100.0 GiB (15.60%) in 46s
drive-scsi0: transferred 16.3 GiB of 100.0 GiB (16.35%) in 47s
drive-scsi0: transferred 16.5 GiB of 100.0 GiB (16.46%) in 48s
drive-scsi0: transferred 18.3 GiB of 100.0 GiB (18.32%) in 49s
drive-scsi0: transferred 18.7 GiB of 100.0 GiB (18.67%) in 50s
drive-scsi0: transferred 19.2 GiB of 100.0 GiB (19.17%) in 51s
drive-scsi0: transferred 20.2 GiB of 100.0 GiB (20.23%) in 52s
drive-scsi0: transferred 20.3 GiB of 100.0 GiB (20.35%) in 53s
drive-scsi0: transferred 20.7 GiB of 100.0 GiB (20.69%) in 54s
drive-scsi0: transferred 21.0 GiB of 100.0 GiB (21.01%) in 55s
drive-scsi0: transferred 21.1 GiB of 100.0 GiB (21.14%) in 56s
drive-scsi0: transferred 21.2 GiB of 100.0 GiB (21.25%) in 57s
drive-scsi0: transferred 21.5 GiB of 100.0 GiB (21.47%) in 58s
drive-scsi0: transferred 21.6 GiB of 100.0 GiB (21.64%) in 59s
drive-scsi0: transferred 21.8 GiB of 100.0 GiB (21.79%) in 1m
drive-scsi0: transferred 21.9 GiB of 100.0 GiB (21.92%) in 1m 1s
drive-scsi0: transferred 22.1 GiB of 100.0 GiB (22.06%) in 1m 2s
drive-scsi0: transferred 22.6 GiB of 100.0 GiB (22.65%) in 1m 3s
drive-scsi0: transferred 22.9 GiB of 100.0 GiB (22.89%) in 1m 4s
drive-scsi0: transferred 23.0 GiB of 100.0 GiB (23.04%) in 1m 5s
drive-scsi0: transferred 23.2 GiB of 100.0 GiB (23.18%) in 1m 6s
drive-scsi0: transferred 23.6 GiB of 100.0 GiB (23.60%) in 1m 7s
drive-scsi0: transferred 23.7 GiB of 100.0 GiB (23.72%) in 1m 8s
drive-scsi0: transferred 24.6 GiB of 100.0 GiB (24.63%) in 1m 9s
drive-scsi0: transferred 24.7 GiB of 100.0 GiB (24.74%) in 1m 10s
drive-scsi0: transferred 25.0 GiB of 100.0 GiB (24.97%) in 1m 11s
drive-scsi0: transferred 25.1 GiB of 100.0 GiB (25.09%) in 1m 12s
drive-scsi0: transferred 28.5 GiB of 100.0 GiB (28.54%) in 1m 13s
drive-scsi0: transferred 34.5 GiB of 100.0 GiB (34.54%) in 1m 14s
drive-scsi0: transferred 44.5 GiB of 100.0 GiB (44.50%) in 1m 15s
drive-scsi0: transferred 48.7 GiB of 100.0 GiB (48.71%) in 1m 16s
drive-scsi0: transferred 48.8 GiB of 100.0 GiB (48.83%) in 1m 17s
drive-scsi0: transferred 48.9 GiB of 100.0 GiB (48.93%) in 1m 18s
drive-scsi0: transferred 49.0 GiB of 100.0 GiB (49.04%) in 1m 19s
drive-scsi0: transferred 49.2 GiB of 100.0 GiB (49.19%) in 1m 20s
drive-scsi0: transferred 49.3 GiB of 100.0 GiB (49.30%) in 1m 21s
drive-scsi0: transferred 50.8 GiB of 100.0 GiB (50.77%) in 1m 22s
drive-scsi0: transferred 51.3 GiB of 100.0 GiB (51.33%) in 1m 23s
drive-scsi0: transferred 51.5 GiB of 100.0 GiB (51.49%) in 1m 24s
drive-scsi0: transferred 51.6 GiB of 100.0 GiB (51.62%) in 1m 25s
drive-scsi0: transferred 51.7 GiB of 100.0 GiB (51.74%) in 1m 26s
drive-scsi0: transferred 51.9 GiB of 100.0 GiB (51.86%) in 1m 27s
drive-scsi0: transferred 52.0 GiB of 100.0 GiB (51.98%) in 1m 28s
drive-scsi0: transferred 52.1 GiB of 100.0 GiB (52.10%) in 1m 29s
drive-scsi0: transferred 52.2 GiB of 100.0 GiB (52.22%) in 1m 30s
drive-scsi0: transferred 52.3 GiB of 100.0 GiB (52.34%) in 1m 31s
drive-scsi0: transferred 52.5 GiB of 100.0 GiB (52.46%) in 1m 32s
drive-scsi0: transferred 52.7 GiB of 100.0 GiB (52.68%) in 1m 33s
drive-scsi0: transferred 52.8 GiB of 100.0 GiB (52.78%) in 1m 34s
drive-scsi0: transferred 52.9 GiB of 100.0 GiB (52.89%) in 1m 35s
drive-scsi0: transferred 53.0 GiB of 100.0 GiB (53.01%) in 1m 36s
drive-scsi0: transferred 53.1 GiB of 100.0 GiB (53.12%) in 1m 37s
drive-scsi0: transferred 53.3 GiB of 100.0 GiB (53.26%) in 1m 38s
drive-scsi0: transferred 80.7 GiB of 100.0 GiB (80.70%) in 1m 39s
drive-scsi0: transferred 80.8 GiB of 100.0 GiB (80.81%) in 1m 40s
drive-scsi0: transferred 81.0 GiB of 100.0 GiB (80.96%) in 1m 41s
drive-scsi0: transferred 88.7 GiB of 100.0 GiB (88.72%) in 1m 43s
drive-scsi0: transferred 88.8 GiB of 100.0 GiB (88.82%) in 1m 44s
drive-scsi0: transferred 89.4 GiB of 100.0 GiB (89.45%) in 1m 45s
drive-scsi0: transferred 89.5 GiB of 100.0 GiB (89.54%) in 1m 46s
drive-scsi0: transferred 89.6 GiB of 100.0 GiB (89.64%) in 1m 47s
drive-scsi0: transferred 89.8 GiB of 100.0 GiB (89.78%) in 1m 48s
drive-scsi0: transferred 89.9 GiB of 100.0 GiB (89.92%) in 1m 49s
drive-scsi0: transferred 90.1 GiB of 100.0 GiB (90.14%) in 1m 50s
drive-scsi0: transferred 90.7 GiB of 100.0 GiB (90.69%) in 1m 51s
drive-scsi0: transferred 91.5 GiB of 100.0 GiB (91.49%) in 1m 52s
drive-scsi0: transferred 94.6 GiB of 100.0 GiB (94.63%) in 1m 53s
drive-scsi0: transferred 96.6 GiB of 100.0 GiB (96.63%) in 1m 54s
drive-scsi0: transferred 98.6 GiB of 100.0 GiB (98.64%) in 1m 55s
drive-scsi0: transferred 100.0 GiB of 100.0 GiB (100.00%) in 1m 56s
drive-scsi0: transferred 100.0 GiB of 100.0 GiB (100.00%) in 1m 57s
drive-scsi0: transferred 100.0 GiB of 100.0 GiB (100.00%) in 1m 58s, ready
all 'mirror' jobs are ready
2021-08-04 14:01:02 efidisk0: start migration to nbd:unix:/run/qemu-server/666_nbd.migrate:exportname=drive-efidisk0
drive mirror is starting for drive-efidisk0
drive-efidisk0: transferred 128.0 KiB of 128.0 KiB (100.00%) in 0s
drive-efidisk0: transferred 128.0 KiB of 128.0 KiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
2021-08-04 14:01:03 starting online/live migration on unix:/run/qemu-server/666.migrate
2021-08-04 14:01:03 set migration capabilities
2021-08-04 14:01:03 migration downtime limit: 100 ms
2021-08-04 14:01:03 migration cachesize: 256.0 MiB
2021-08-04 14:01:03 set migration parameters
2021-08-04 14:01:03 start migrate command to unix:/run/qemu-server/666.migrate
2021-08-04 14:01:04 migration active, transferred 109.4 MiB of 2.0 GiB VM-state, 113.8 MiB/s
2021-08-04 14:01:05 migration active, transferred 220.9 MiB of 2.0 GiB VM-state, 126.6 MiB/s
2021-08-04 14:01:06 migration active, transferred 333.8 MiB of 2.0 GiB VM-state, 114.3 MiB/s
2021-08-04 14:01:07 migration active, transferred 445.7 MiB of 2.0 GiB VM-state, 114.4 MiB/s
2021-08-04 14:01:08 migration active, transferred 556.4 MiB of 2.0 GiB VM-state, 107.2 MiB/s
2021-08-04 14:01:09 migration active, transferred 668.9 MiB of 2.0 GiB VM-state, 115.9 MiB/s
2021-08-04 14:01:10 migration active, transferred 781.2 MiB of 2.0 GiB VM-state, 115.0 MiB/s
2021-08-04 14:01:11 migration active, transferred 893.1 MiB of 2.0 GiB VM-state, 113.6 MiB/s
2021-08-04 14:01:12 migration active, transferred 1004.6 MiB of 2.0 GiB VM-state, 122.6 MiB/s
2021-08-04 14:01:13 migration active, transferred 1.1 GiB of 2.0 GiB VM-state, 111.9 MiB/s
2021-08-04 14:01:14 migration active, transferred 1.2 GiB of 2.0 GiB VM-state, 115.4 MiB/s
2021-08-04 14:01:15 migration active, transferred 1.3 GiB of 2.0 GiB VM-state, 115.8 MiB/s
2021-08-04 14:01:16 migration active, transferred 1.4 GiB of 2.0 GiB VM-state, 126.5 MiB/s
2021-08-04 14:01:17 migration active, transferred 1.5 GiB of 2.0 GiB VM-state, 117.1 MiB/s
2021-08-04 14:01:18 migration active, transferred 1.6 GiB of 2.0 GiB VM-state, 112.7 MiB/s
2021-08-04 14:01:19 migration active, transferred 1.7 GiB of 2.0 GiB VM-state, 141.3 MiB/s
2021-08-04 14:01:20 migration active, transferred 1.9 GiB of 2.0 GiB VM-state, 111.7 MiB/s
2021-08-04 14:01:21 average migration speed: 114.4 MiB/s - downtime 244 ms
2021-08-04 14:01:21 migration status: completed
all 'mirror' jobs are ready
drive-efidisk0: Completing block job_id...
drive-efidisk0: Completed successfully.
drive-scsi0: Completing block job_id...
drive-scsi0: Completed successfully.
channel 5: open failed: connect failed: open failed

channel 6: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

channel 4: open failed: connect failed: open failed

drive-efidisk0: mirror-job finished
drive-scsi0: mirror-job finished
2021-08-04 14:01:22 stopping NBD storage migration server on target.
2021-08-04 14:01:22 ERROR: tunnel replied 'ERR: resume failed - VM 666 not running' to command 'resume 666'
2021-08-04 14:01:26 ERROR: migration finished with problems (duration 00:02:24)
TASK ERROR: migration problems
 
Do you run latest version?

Please post:

> pveversion -v
 
source node:
Code:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-10 (running version: 7.0-10/d2f465d3)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.4: 6.4-5
pve-kernel-5.11.22-3-pve: 5.11.22-6
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.4.128-1-pve: 5.4.128-1
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.2.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-5
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-9
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.7-1
proxmox-backup-file-restore: 2.0.7-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-8
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-12
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
target node:
Code:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-10 (running version: 7.0-10/d2f465d3)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.11.22-3-pve: 5.11.22-6
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.2.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-5
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-9
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.7-1
proxmox-backup-file-restore: 2.0.7-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-8
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-12
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
The only difference between the nodes I can think of is the source one is a 6.4 updated in-place to 7 while the target node is a fresh pve 7 install.
 
anything in the system log on either end around the time of the "failed" migration (you can check with journalctl with --since and --until)
 
It is worth noting I do no longer physically own the node1 as I was selling it (which is why I did the migration). On the target node after each migration claimed errors the VM and all it's files already disappeared from node1. Afterwards I ran all the associated filesystem check tools and neither linux nor win10 vms reported any errors. Win10's sfc /scannow reported all system files integrity OK, so I just decided it's gotta be harmless (if it were to cause problems down the road I still have backups I did on node1 before each migration anyway).

This is the log on the target node from ~2min before and after the migration I posted in OP. The 192.168.1.144 was the node1:
edit: the log got cut off here it is full: https://hastebin.com/udoluhalal.yaml

the only thing that jumps out to me is:
Code:
Aug 04 14:01:21 pve2 QEMU[29566]: kvm: warning: TSC frequency mismatch between VM (2999970 kHz) and host (2711998 kHz), and TSC scaling unavailable
Aug 04 14:01:21 pve2 QEMU[29566]: kvm: error: failed to set MSR 0x38f to 0x7000000ff
Aug 04 14:01:21 pve2 QEMU[29566]: kvm: ../target/i386/kvm/kvm.c:2753: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aug 04 14:01:21 pve2 kernel: fwbr666i0: port 2(tap666i0) entered disabled state
Aug 04 14:01:21 pve2 kernel: fwbr666i0: port 2(tap666i0) entered disabled state
Aug 04 14:01:21 pve2 sshd[29608]: error: connect to /run/qemu-server/666_nbd.migrate port -2 failed: Connection refused
Aug 04 14:01:21 pve2 sshd[29608]: error: connect to /run/qemu-server/666_nbd.migrate port -2 failed: Connection refused
Aug 04 14:01:21 pve2 sshd[29608]: error: connect to /run/qemu-server/666_nbd.migrate port -2 failed: Connection refused
Aug 04 14:01:21 pve2 sshd[29608]: error: connect to /run/qemu-server/666_nbd.migrate port -2 failed: Connection refused
though I've no idea what it actually means apart from the mismatching cpu warning.
 
Last edited:
so the migration failed right at the end when resuming the target VM, after all the disks had been migrated. was this by chance a VM configured with some host-specific CPU type, and the old and new server have a different CPU?
 
Indeed that was the case. The VM's were configured with cpu=host. node1 was 4 core xeon e3-1220v5, the new node is 12 core (with ht) xeon e-2176m. If that is the only issue, perhaps the warning could be better worded. Admittedly I'm not native english speaker so I might've just overestimated the severity of the warning.
 
well .. the migration DID fail (you attempted a live-migration, and the VM crashed at the end and didn't run on either end until manually started) ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!