Live migrate fails to resume.

If I switch off krbd; does that take effect immediately or would it take effect after a migration? Is there a way to tell that the connection is using librbd vs krbd?
It takes effect after migration or shutdown+start of the guest (or Reboot button in the UI). You can check the QEMU commandline
what is currently used:
Code:
grep '=rbd:' /proc/$(cat /var/run/qemu-server/125.pid)/cmdline
should only match when librbd is used for VM 125.
 
Update:

Changing the line in the QemuServer.pm seemed to "workaround" the problem. With that changed live_migrate does resume correctly but its about a 6-10 second downtime. That's better than what was happening but I'm still working on the other suggestions as I'd like to find the root cause. I'll keep this posted as I know more.
 
  • Like
Reactions: fiona
I unchecked krbd on the ceph storage and everything seems to be migrating correctly now. Guest virtual machines resume correctly and the cutover is less than a second. I'm going to consider this resolved for the time being. However; I would have expected krbd to work better considering it talks directly to the raw block devices.

I appreciate the help with this issue.
 
I unchecked krbd on the ceph storage and everything seems to be migrating correctly now. Guest virtual machines resume correctly and the cutover is less than a second. I'm going to consider this resolved for the time being. However; I would have expected krbd to work better considering it talks directly to the raw block devices.

I appreciate the help with this issue.
Glad you found a workaround. Yes, I would've expected krbd to have less overhead too. The block devices are already mapped when the target instance is started, I'm still wondering what exactly blocks QEMU when finishing the migration.

If you'd like to further debug the issue, you could set the following options (for a shut down VM):
Code:
qm set 123 --args '-trace vmstate_downtime*,file=/tmp/migration.log -trace bdrv*,file=/tmp/migration.log -trace blk*,file=/tmp/migration.log -msg timestamp=on'
(of course replace 123 with the actual ID). Then turn on krbd again, start the VM, migrate and share the resulting log file, i.e. /tmp/migration.log on the migration target node. Use qm set 123 --delete args to get rid of the logging options again (but VM needs to be shut down and restarted e.g. via Reboot in the UI before it stops logging).
 
Absolutely, let's dig into it. I'll was able to reproduce the issue after switching back to krbd. Let me go through these steps and I'll reply back with the findings.
 
  • Like
Reactions: justinclift
This is the log from the target node after the migration completed with errors in a "paused" state.

Code:
cat /tmp/migration.log
454476@1721329425.629429:bdrv_open_common bs 0x56357970e250 filename "/dev/rbd-pve/803339ff-cb6f-4918-8386-d821424540c3/okc-r47-1.imgs/vm-106-disk-0" flags 0x6a822 format_name "host_device"
454476@1721329425.629986:blk_root_attach child 0x563579702ee0 blk 0x563579713820 bs 0x56357970e250
454476@1721329425.630007:bdrv_open_common bs 0x563579706bc0 filename "/dev/rbd-pve/803339ff-cb6f-4918-8386-d821424540c3/okc-r47-1.imgs/vm-106-disk-0" flags 0x62822 format_name "raw"
454476@1721329425.630027:blk_root_detach child 0x563579702ee0 blk 0x563579713820 bs 0x56357970e250
454476@1721329425.630048:blk_root_attach child 0x563579702ee0 blk 0x563579713d70 bs 0x563579706bc0
454476@1721329427.121011:blk_co_preadv blk 0x563579713d70 bs 0x563579706bc0 offset 0 bytes 512 flags 0x0
454476@1721329427.121036:bdrv_co_preadv_part bs 0x563579706bc0 offset 0 bytes 512 flags 0x0
454476@1721329427.121057:bdrv_co_preadv_part bs 0x56357970e250 offset 0 bytes 512 flags 0x0
454476@1721329445.738079:vmstate_downtime_load type=iterable idstr=ram instance_id=0 downtime=464
454476@1721329445.739834:vmstate_downtime_load type=non-iterable idstr=0000:00:1e.0/pci_bridge instance_id=0 downtime=399
454476@1721329445.740125:vmstate_downtime_load type=non-iterable idstr=0000:00:1f.0/pci_bridge instance_id=0 downtime=265
454476@1721329445.740589:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0/pci_bridge instance_id=0 downtime=452
454476@1721329445.740601:vmstate_downtime_load type=non-iterable idstr=timer instance_id=0 downtime=2
454476@1721329445.740626:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=0 downtime=21
454476@1721329445.740663:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=0 downtime=24
454476@1721329445.741176:vmstate_downtime_load type=non-iterable idstr=kvm-tpr-opt instance_id=0 downtime=509
454476@1721329445.741224:vmstate_downtime_load type=non-iterable idstr=apic instance_id=0 downtime=38
454476@1721329445.741233:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=1 downtime=4
454476@1721329445.741259:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=1 downtime=16
454476@1721329445.741292:vmstate_downtime_load type=non-iterable idstr=apic instance_id=1 downtime=28
454476@1721329445.741299:vmstate_downtime_load type=non-iterable idstr=kvmclock instance_id=0 downtime=3
454476@1721329445.741824:vmstate_downtime_load type=non-iterable idstr=0000:00:00.0/I440FX instance_id=0 downtime=521
454476@1721329445.741846:vmstate_downtime_load type=non-iterable idstr=PCIHost instance_id=0 downtime=0
454476@1721329445.741859:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=0 downtime=1
454476@1721329445.741871:vmstate_downtime_load type=non-iterable idstr=fw_cfg instance_id=0 downtime=7
454476@1721329445.741879:vmstate_downtime_load type=non-iterable idstr=dma instance_id=0 downtime=3
454476@1721329445.741884:vmstate_downtime_load type=non-iterable idstr=dma instance_id=1 downtime=1
454476@1721329445.741892:vmstate_downtime_load type=non-iterable idstr=mc146818rtc instance_id=0 downtime=3
454476@1721329445.742234:vmstate_downtime_load type=non-iterable idstr=0000:00:01.1/ide instance_id=0 downtime=334
454476@1721329445.742241:vmstate_downtime_load type=non-iterable idstr=i2c_bus instance_id=0 downtime=0
454476@1721329445.742850:vmstate_downtime_load type=non-iterable idstr=0000:00:01.3/piix4_pm instance_id=0 downtime=604
454476@1721329445.742865:vmstate_downtime_load type=non-iterable idstr=0000:00:01.0/PIIX3 instance_id=0 downtime=3
454476@1721329445.742875:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=0 downtime=4
454476@1721329445.742887:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=1 downtime=2
454476@1721329445.742895:vmstate_downtime_load type=non-iterable idstr=ioapic instance_id=0 downtime=4
454476@1721329445.742903:vmstate_downtime_load type=non-iterable idstr=hpet instance_id=0 downtime=4
454476@1721329445.742910:vmstate_downtime_load type=non-iterable idstr=i8254 instance_id=0 downtime=2
454476@1721329445.742915:vmstate_downtime_load type=non-iterable idstr=pcspk instance_id=0 downtime=2
454476@1721329445.742928:vmstate_downtime_load type=non-iterable idstr=fdc instance_id=0 downtime=7
454476@1721329445.742934:vmstate_downtime_load type=non-iterable idstr=ps2kbd instance_id=0 downtime=2
454476@1721329445.742940:vmstate_downtime_load type=non-iterable idstr=ps2mouse instance_id=0 downtime=1
454476@1721329445.742945:vmstate_downtime_load type=non-iterable idstr=pckbd instance_id=0 downtime=2
454476@1721329445.742964:vmstate_downtime_load type=non-iterable idstr=vmmouse instance_id=0 downtime=15
454476@1721329445.742971:vmstate_downtime_load type=non-iterable idstr=port92 instance_id=0 downtime=0
454476@1721329445.742977:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=1 downtime=1
454476@1721329445.742981:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=2 downtime=0
454476@1721329445.742985:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=3 downtime=1
454476@1721329445.743222:vmstate_downtime_load type=non-iterable idstr=vmgenid instance_id=0 downtime=234
454476@1721329445.743519:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/uhci instance_id=0 downtime=289
454476@1721329445.743528:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/1/usb-ptr instance_id=0 downtime=4
454476@1721329445.744302:vmstate_downtime_load type=non-iterable idstr=0000:00:02.0/vga instance_id=0 downtime=770
454476@1721329445.745379:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/virtio-scsi instance_id=0 downtime=1060
454476@1721329445.745404:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/0:0:0/scsi-disk instance_id=0 downtime=3
454476@1721329445.746657:vmstate_downtime_load type=non-iterable idstr=0000:00:12.0/virtio-net instance_id=0 downtime=1247
454476@1721329445.747919:vmstate_downtime_load type=non-iterable idstr=0000:00:13.0/virtio-net instance_id=0 downtime=1248
454476@1721329445.747934:vmstate_downtime_load type=non-iterable idstr=acpi_build instance_id=0 downtime=1
454476@1721329445.747948:vmstate_downtime_load type=non-iterable idstr=globalstate instance_id=0 downtime=2
454476@1721329445.748100:vmstate_downtime_checkpoint dst-precopy-loadvm-completed
454476@1721329445.748112:vmstate_downtime_checkpoint dst-precopy-bh-enter
454476@1721329456.222389:vmstate_downtime_checkpoint dst-precopy-bh-announced
454476@1721329456.222448:vmstate_downtime_checkpoint dst-precopy-bh-vm-started
454476@1721329504.927647:blk_root_detach child 0x563579702ee0 blk 0x563579713d70 bs 0x563579706bc0
 
This is the log from the target node after the migration completed with errors in a "paused" state.

Code:
454476@1721329445.748100:vmstate_downtime_checkpoint dst-precopy-loadvm-completed
454476@1721329445.748112:vmstate_downtime_checkpoint dst-precopy-bh-enter
454476@1721329456.222389:vmstate_downtime_checkpoint dst-precopy-bh-announced
454476@1721329456.222448:vmstate_downtime_checkpoint dst-precopy-bh-vm-started
Thanks! There is 11 seconds between the dst-precopy-bh-enter and dst-precopy-bh-announced trace lines and those are here in the code:
Code:
    trace_vmstate_downtime_checkpoint("dst-precopy-bh-enter");

    /* If capability late_block_activate is set:
     * Only fire up the block code now if we're going to restart the
     * VM, else 'cont' will do it.
     * This causes file locking to happen; so we don't want it to happen
     * unless we really are starting the VM.
     */
    if (!migrate_late_block_activate() ||
         (autostart && (!global_state_received() ||
            runstate_is_live(global_state_get_runstate())))) {
        /* Make sure all file formats throw away their mutable metadata.
         * If we get an error here, just don't restart the VM yet. */
        bdrv_activate_all(&local_err);
        if (local_err) {
            error_report_err(local_err);
            local_err = NULL;
            autostart = false;
        }
    }

    /*
     * This must happen after all error conditions are dealt with and
     * we're sure the VM is going to be running on this host.
     */
    qemu_announce_self(&mis->announce_timer, migrate_announce_params());

    trace_vmstate_downtime_checkpoint("dst-precopy-bh-announced");

So either it's the block device activation or the qemu_announce_self() call that takes so long.
 
Could you try again with the following?
Code:
qm set 123 --args '-trace vmstate_downtime*,file=/tmp/migration.log -trace thread_pool*,file=/tmp/migration.log -trace luring*,file=/tmp/migration.log -trace qemu_announce*,file=/tmp/migration.log -msg timestamp=on'

And also share at what time the resume command ran into the timeout. This should narrow it down further.
 
  • Like
Reactions: justinclift
Code:
1764680@1721405248.232737:bdrv_open_common bs 0x5e38c5b94250 filename "/dev/rbd-pve/803339ff-cb6f-4918-8386-d821424540c3/okc-r47-1.imgs/vm-106-disk-0" flags 0x6a822 format_name "host_device"
1764680@1721405248.233272:blk_root_attach child 0x5e38c5b88ee0 blk 0x5e38c5b99850 bs 0x5e38c5b94250
1764680@1721405248.233294:bdrv_open_common bs 0x5e38c5b8cbc0 filename "/dev/rbd-pve/803339ff-cb6f-4918-8386-d821424540c3/okc-r47-1.imgs/vm-106-disk-0" flags 0x62822 format_name "raw"
1764680@1721405248.233314:blk_root_detach child 0x5e38c5b88ee0 blk 0x5e38c5b99850 bs 0x5e38c5b94250
1764680@1721405248.233331:blk_root_attach child 0x5e38c5b88ee0 blk 0x5e38c5b99d60 bs 0x5e38c5b8cbc0
1764680@1721405249.584763:blk_co_preadv blk 0x5e38c5b99d60 bs 0x5e38c5b8cbc0 offset 0 bytes 512 flags 0x0
1764680@1721405249.584792:bdrv_co_preadv_part bs 0x5e38c5b8cbc0 offset 0 bytes 512 flags 0x0
1764680@1721405249.584797:bdrv_co_preadv_part bs 0x5e38c5b94250 offset 0 bytes 512 flags 0x0
1764680@1721405258.559521:vmstate_downtime_load type=iterable idstr=ram instance_id=0 downtime=14
1764680@1721405258.561400:vmstate_downtime_load type=non-iterable idstr=0000:00:1e.0/pci_bridge instance_id=0 downtime=379
1764680@1721405258.561688:vmstate_downtime_load type=non-iterable idstr=0000:00:1f.0/pci_bridge instance_id=0 downtime=276
1764680@1721405258.562134:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0/pci_bridge instance_id=0 downtime=437
1764680@1721405258.562143:vmstate_downtime_load type=non-iterable idstr=timer instance_id=0 downtime=2
1764680@1721405258.562169:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=0 downtime=13
1764680@1721405258.562205:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=0 downtime=20
1764680@1721405258.562829:vmstate_downtime_load type=non-iterable idstr=kvm-tpr-opt instance_id=0 downtime=618
1764680@1721405258.562913:vmstate_downtime_load type=non-iterable idstr=apic instance_id=0 downtime=38
1764680@1721405258.562928:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=1 downtime=5
1764680@1721405258.562979:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=1 downtime=17
1764680@1721405258.563013:vmstate_downtime_load type=non-iterable idstr=apic instance_id=1 downtime=27
1764680@1721405258.563022:vmstate_downtime_load type=non-iterable idstr=kvmclock instance_id=0 downtime=3
1764680@1721405258.563515:vmstate_downtime_load type=non-iterable idstr=0000:00:00.0/I440FX instance_id=0 downtime=488
1764680@1721405258.563527:vmstate_downtime_load type=non-iterable idstr=PCIHost instance_id=0 downtime=0
1764680@1721405258.563545:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=0 downtime=1
1764680@1721405258.563566:vmstate_downtime_load type=non-iterable idstr=fw_cfg instance_id=0 downtime=6
1764680@1721405258.563575:vmstate_downtime_load type=non-iterable idstr=dma instance_id=0 downtime=3
1764680@1721405258.563590:vmstate_downtime_load type=non-iterable idstr=dma instance_id=1 downtime=1
1764680@1721405258.563599:vmstate_downtime_load type=non-iterable idstr=mc146818rtc instance_id=0 downtime=3
1764680@1721405258.563940:vmstate_downtime_load type=non-iterable idstr=0000:00:01.1/ide instance_id=0 downtime=323
1764680@1721405258.563952:vmstate_downtime_load type=non-iterable idstr=i2c_bus instance_id=0 downtime=1
1764680@1721405258.564554:vmstate_downtime_load type=non-iterable idstr=0000:00:01.3/piix4_pm instance_id=0 downtime=597
1764680@1721405258.564569:vmstate_downtime_load type=non-iterable idstr=0000:00:01.0/PIIX3 instance_id=0 downtime=3
1764680@1721405258.564578:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=0 downtime=5
1764680@1721405258.564608:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=1 downtime=2
1764680@1721405258.564625:vmstate_downtime_load type=non-iterable idstr=ioapic instance_id=0 downtime=4
1764680@1721405258.564633:vmstate_downtime_load type=non-iterable idstr=hpet instance_id=0 downtime=4
1764680@1721405258.564645:vmstate_downtime_load type=non-iterable idstr=i8254 instance_id=0 downtime=2
1764680@1721405258.564650:vmstate_downtime_load type=non-iterable idstr=pcspk instance_id=0 downtime=1
1764680@1721405258.564666:vmstate_downtime_load type=non-iterable idstr=fdc instance_id=0 downtime=7
1764680@1721405258.564673:vmstate_downtime_load type=non-iterable idstr=ps2kbd instance_id=0 downtime=2
1764680@1721405258.564678:vmstate_downtime_load type=non-iterable idstr=ps2mouse instance_id=0 downtime=1
1764680@1721405258.564683:vmstate_downtime_load type=non-iterable idstr=pckbd instance_id=0 downtime=2
1764680@1721405258.564707:vmstate_downtime_load type=non-iterable idstr=vmmouse instance_id=0 downtime=16
1764680@1721405258.564733:vmstate_downtime_load type=non-iterable idstr=port92 instance_id=0 downtime=0
1764680@1721405258.564741:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=1 downtime=1
1764680@1721405258.564755:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=2 downtime=0
1764680@1721405258.564760:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=3 downtime=0
1764680@1721405258.564964:vmstate_downtime_load type=non-iterable idstr=vmgenid instance_id=0 downtime=183
1764680@1721405258.565229:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/uhci instance_id=0 downtime=256
1764680@1721405258.565241:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/1/usb-ptr instance_id=0 downtime=4
1764680@1721405258.565972:vmstate_downtime_load type=non-iterable idstr=0000:00:02.0/vga instance_id=0 downtime=716
1764680@1721405258.567120:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/virtio-scsi instance_id=0 downtime=1118
1764680@1721405258.567151:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/0:0:0/scsi-disk instance_id=0 downtime=2
1764680@1721405258.568382:vmstate_downtime_load type=non-iterable idstr=0000:00:12.0/virtio-net instance_id=0 downtime=1225
1764680@1721405268.575985:vmstate_downtime_load type=non-iterable idstr=0000:00:13.0/virtio-net instance_id=0 downtime=10007584
1764680@1721405268.576048:vmstate_downtime_load type=non-iterable idstr=acpi_build instance_id=0 downtime=1
1764680@1721405268.576076:vmstate_downtime_load type=non-iterable idstr=globalstate instance_id=0 downtime=3
1764680@1721405268.576311:vmstate_downtime_checkpoint dst-precopy-loadvm-completed
1764680@1721405268.576333:vmstate_downtime_checkpoint dst-precopy-bh-enter
1764680@1721405279.068607:vmstate_downtime_checkpoint dst-precopy-bh-announced
1764680@1721405279.068646:vmstate_downtime_checkpoint dst-precopy-bh-vm-started

2024-07-19 11:07:48 migration status: completed
2024-07-19 11:07:53 ERROR: tunnel replied 'ERR: resume failed - VM 106 qmp command 'query-status' failed - got timeout' to command 'resume 106'
 
Unfortunately, it seems like you're still using the old tracepoints, i.e. blk_* and bdrv_* are still present in the output, but not qemu_announce*. The VM needs to be shutdown+started (e.g. via the Reboot button in the web UI, reboot within the guest is not enough) to pick up the new options.
 
Code:
3435640@1721653322.227140:thread_pool_submit pool 0x5b693d9c3d00 req 0x5b693e7f3e20 opaque 0x763e6931c7f0
3435640@1721653322.227800:thread_pool_complete pool 0x5b693d9c3d00 req 0x5b693e7f3e20 opaque 0x763e6931c780 ret 0
3435640@1721653322.228483:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
3435640@1721653322.229182:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
3435640@1721653341.297229:vmstate_downtime_load type=iterable idstr=ram instance_id=0 downtime=3922
3435640@1721653341.297738:vmstate_downtime_load type=non-iterable idstr=0000:00:1e.0/pci_bridge instance_id=0 downtime=448
3435640@1721653341.298050:vmstate_downtime_load type=non-iterable idstr=0000:00:1f.0/pci_bridge instance_id=0 downtime=283
3435640@1721653341.298467:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0/pci_bridge instance_id=0 downtime=410
3435640@1721653341.298476:vmstate_downtime_load type=non-iterable idstr=timer instance_id=0 downtime=2
3435640@1721653341.298499:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=0 downtime=19
3435640@1721653341.298544:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=0 downtime=33
3435640@1721653341.299236:vmstate_downtime_load type=non-iterable idstr=kvm-tpr-opt instance_id=0 downtime=686
3435640@1721653341.299309:vmstate_downtime_load type=non-iterable idstr=apic instance_id=0 downtime=32
3435640@1721653341.299318:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=1 downtime=4
3435640@1721653341.299338:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=1 downtime=17
3435640@1721653341.299379:vmstate_downtime_load type=non-iterable idstr=apic instance_id=1 downtime=24
3435640@1721653341.299385:vmstate_downtime_load type=non-iterable idstr=kvmclock instance_id=0 downtime=2
3435640@1721653341.299899:vmstate_downtime_load type=non-iterable idstr=0000:00:00.0/I440FX instance_id=0 downtime=510
3435640@1721653341.299926:vmstate_downtime_load type=non-iterable idstr=PCIHost instance_id=0 downtime=0
3435640@1721653341.299931:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=0 downtime=1
3435640@1721653341.299949:vmstate_downtime_load type=non-iterable idstr=fw_cfg instance_id=0 downtime=6
3435640@1721653341.299954:vmstate_downtime_load type=non-iterable idstr=dma instance_id=0 downtime=2
3435640@1721653341.299959:vmstate_downtime_load type=non-iterable idstr=dma instance_id=1 downtime=1
3435640@1721653341.299966:vmstate_downtime_load type=non-iterable idstr=mc146818rtc instance_id=0 downtime=3
3435640@1721653341.300293:vmstate_downtime_load type=non-iterable idstr=0000:00:01.1/ide instance_id=0 downtime=324
3435640@1721653341.300304:vmstate_downtime_load type=non-iterable idstr=i2c_bus instance_id=0 downtime=0
3435640@1721653341.300885:vmstate_downtime_load type=non-iterable idstr=0000:00:01.3/piix4_pm instance_id=0 downtime=578
3435640@1721653341.300895:vmstate_downtime_load type=non-iterable idstr=0000:00:01.0/PIIX3 instance_id=0 downtime=2
3435640@1721653341.300904:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=0 downtime=4
3435640@1721653341.300914:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=1 downtime=2
3435640@1721653341.300921:vmstate_downtime_load type=non-iterable idstr=ioapic instance_id=0 downtime=4
3435640@1721653341.300928:vmstate_downtime_load type=non-iterable idstr=hpet instance_id=0 downtime=3
3435640@1721653341.300934:vmstate_downtime_load type=non-iterable idstr=i8254 instance_id=0 downtime=2
3435640@1721653341.300939:vmstate_downtime_load type=non-iterable idstr=pcspk instance_id=0 downtime=1
3435640@1721653341.300952:vmstate_downtime_load type=non-iterable idstr=fdc instance_id=0 downtime=5
3435640@1721653341.300957:vmstate_downtime_load type=non-iterable idstr=ps2kbd instance_id=0 downtime=2
3435640@1721653341.300962:vmstate_downtime_load type=non-iterable idstr=ps2mouse instance_id=0 downtime=1
3435640@1721653341.300966:vmstate_downtime_load type=non-iterable idstr=pckbd instance_id=0 downtime=1
3435640@1721653341.300987:vmstate_downtime_load type=non-iterable idstr=vmmouse instance_id=0 downtime=17
3435640@1721653341.300994:vmstate_downtime_load type=non-iterable idstr=port92 instance_id=0 downtime=0
3435640@1721653341.300999:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=1 downtime=0
3435640@1721653341.301003:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=2 downtime=0
3435640@1721653341.301007:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=3 downtime=1
3435640@1721653341.301186:vmstate_downtime_load type=non-iterable idstr=vmgenid instance_id=0 downtime=176
3435640@1721653341.301449:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/uhci instance_id=0 downtime=256
3435640@1721653341.301462:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/1/usb-ptr instance_id=0 downtime=9
3435640@1721653341.302165:vmstate_downtime_load type=non-iterable idstr=0000:00:02.0/vga instance_id=0 downtime=699
3435640@1721653341.303235:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/virtio-scsi instance_id=0 downtime=1057
3435640@1721653341.303258:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/0:0:0/scsi-disk instance_id=0 downtime=2
3435640@1721653341.304032:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
3435640@1721653341.304434:vmstate_downtime_load type=non-iterable idstr=0000:00:12.0/virtio-net instance_id=0 downtime=1171
3435640@1721653341.305208:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
3435640@1721653341.305589:vmstate_downtime_load type=non-iterable idstr=0000:00:13.0/virtio-net instance_id=0 downtime=1146
3435640@1721653341.305605:vmstate_downtime_load type=non-iterable idstr=acpi_build instance_id=0 downtime=0
3435640@1721653341.305611:vmstate_downtime_load type=non-iterable idstr=globalstate instance_id=0 downtime=2
3435640@1721653341.305749:vmstate_downtime_checkpoint dst-precopy-loadvm-completed
3435640@1721653341.305762:vmstate_downtime_checkpoint dst-precopy-bh-enter
3435640@1721653351.774279:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
3435640@1721653351.774325:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
3435640@1721653351.774332:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
3435640@1721653351.774336:vmstate_downtime_checkpoint dst-precopy-bh-announced
3435640@1721653351.774341:vmstate_downtime_checkpoint dst-precopy-bh-vm-started
3435640@1721653351.823466:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
3435640@1721653351.823487:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
3435640@1721653351.973479:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
3435640@1721653351.973508:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
3435640@1721653352.223495:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
3435640@1721653352.223558:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
3435640@1721653352.573831:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
3435640@1721653352.573866:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
3435640@1721653352.573907:qemu_announce_timer_del free named: 1 free timer: 0 id: (null)
3435642@1721653358.767180:luring_init_state s 0x763e5c0036f0 size 264
3435642@1721653358.767281:luring_co_submit bs 0x5b693dbdf250 s 0x763e5c0036f0 luringcb 0x763e684d2590 fd 38 offset 66708627456 nbytes 12288 type 2
3435642@1721653358.767286:luring_do_submit LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.767291:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.767310:luring_io_uring_submit LuringState 0x763e5c0036f0 ret 1
3435642@1721653358.767323:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 0 inflight 1
3435642@1721653358.768018:luring_process_completion LuringState 0x763e5c0036f0 luringcb 0x763e684d2590 ret 12288
3435642@1721653358.768098:luring_co_submit bs 0x5b693dbdf250 s 0x763e5c0036f0 luringcb 0x763e684d2d60 fd 38 offset 0 nbytes 0 type 8
3435642@1721653358.768102:luring_do_submit LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.768112:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.768142:luring_io_uring_submit LuringState 0x763e5c0036f0 ret 1
3435642@1721653358.768146:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 0 inflight 1
3435642@1721653358.768160:luring_process_completion LuringState 0x763e5c0036f0 luringcb 0x763e684d2d60 ret 0
3435642@1721653358.768189:luring_co_submit bs 0x5b693dbdf250 s 0x763e5c0036f0 luringcb 0x763e684d2590 fd 38 offset 66708639744 nbytes 4096 type 2
3435642@1721653358.768197:luring_do_submit LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.768200:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.768208:luring_io_uring_submit LuringState 0x763e5c0036f0 ret 1
3435642@1721653358.768211:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 0 inflight 1
3435642@1721653358.768710:luring_process_completion LuringState 0x763e5c0036f0 luringcb 0x763e684d2590 ret 4096
3435642@1721653358.768751:luring_co_submit bs 0x5b693dbdf250 s 0x763e5c0036f0 luringcb 0x763e684d2d60 fd 38 offset 0 nbytes 0 type 8
3435642@1721653358.768756:luring_do_submit LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.768759:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 1 inflight 0
3435642@1721653358.768765:luring_io_uring_submit LuringState 0x763e5c0036f0 ret 1
3435642@1721653358.768768:luring_unplug_fn LuringState 0x763e5c0036f0 blocked 0 queued 0 inflight 1
3435642@1721653358.768776:luring_process_completion LuringState 0x763e5c0036f0 luringcb 0x763e684d2d60 ret 0

2024-07-22 08:02:26 ERROR: tunnel replied 'ERR: resume failed - VM 106 qmp command 'query-status' failed - got timeout' to command 'resume 106'
 
Code:
3435640@1721653341.305749:vmstate_downtime_checkpoint dst-precopy-loadvm-completed
3435640@1721653341.305762:vmstate_downtime_checkpoint dst-precopy-bh-enter
3435640@1721653351.774279:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
3435640@1721653351.774325:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
3435640@1721653351.774332:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
3435640@1721653351.774336:vmstate_downtime_checkpoint dst-precopy-bh-announced
3435640@1721653351.774341:vmstate_downtime_checkpoint dst-precopy-bh-vm-started
Since there is 10 seconds before the qemu_annouce trace is logged, we can be pretty sure it is actually the block device activation taking so long. Unfortunately, there are no other trace points in the relevant code, except if a flush would happen there should be either a thread_pool_submit or luring_co_submit both not present in the relevant part of the trace.
 
I guess the fix on the Proxmox VE side is to just increase the allowed timeout in the context of the resume call.

If you'd still like to debug further, you can try starting the migration and then on the target, run
Code:
strace -t -T -e '!ppoll' -p $(cat /var/run/qemu-server/123.pid) 2>&1 >/dev/null | awk '{gsub(/[<>]/,"",$NF)}$NF+.0>0.1'
while the migration is ongoing (replacing 123 with the actual ID). This should show all syscalls from the target process taking more than 0.1 seconds. Please also provide the migration log so the timings can be interpreted.
 
Code:
strace -t -T -e '!ppoll' -p $(cat /var/run/qemu-server/106.pid) 2>&1 >/dev/null | awk '{gsub(/[<>]/,"",$NF)}$NF+.0>0.1'
cat: /var/run/qemu-server/106.pid: No such file or directory
!
file=/var/run/qemu-server/106.pid
while [ ! -f "$file" ]; do sleep 1; done; strace -t -T -e '!ppoll' -p $(cat /var/run/qemu-server/106.pid) 2>&1 >/dev/null | awk '{gsub(/[<>]/,"",$NF)}$NF+.0>0.1'
07:53:49 wait4(490842, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 490842 0.196843
07:53:49 wait4(491010, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 491010 0.685284
07:54:10 pread64(38, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096 10.494424
!

I had to create a loop so that it would start tracing the file once it was available.

Code:
490836@1721825629.911959:thread_pool_submit pool 0x560781812d00 req 0x560782642e20 opaque 0x7f369d5ff7f0
490836@1721825629.912555:thread_pool_complete pool 0x560781812d00 req 0x560782642e20 opaque 0x7f369d5ff780 ret 0
490836@1721825629.912968:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
490836@1721825629.914204:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
490836@1721825639.985829:vmstate_downtime_load type=iterable idstr=ram instance_id=0 downtime=146
490836@1721825639.987508:vmstate_downtime_load type=non-iterable idstr=0000:00:1e.0/pci_bridge instance_id=0 downtime=400
490836@1721825639.987848:vmstate_downtime_load type=non-iterable idstr=0000:00:1f.0/pci_bridge instance_id=0 downtime=261
490836@1721825639.988322:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0/pci_bridge instance_id=0 downtime=427
490836@1721825639.988371:vmstate_downtime_load type=non-iterable idstr=timer instance_id=0 downtime=3
490836@1721825639.988533:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=0 downtime=121
490836@1721825639.988608:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=0 downtime=23
490836@1721825639.989459:vmstate_downtime_load type=non-iterable idstr=kvm-tpr-opt instance_id=0 downtime=809
490836@1721825639.989661:vmstate_downtime_load type=non-iterable idstr=apic instance_id=0 downtime=156
490836@1721825639.989802:vmstate_downtime_load type=non-iterable idstr=cpu_common instance_id=1 downtime=98
490836@1721825639.989858:vmstate_downtime_load type=non-iterable idstr=cpu instance_id=1 downtime=14
490836@1721825639.989959:vmstate_downtime_load type=non-iterable idstr=apic instance_id=1 downtime=54
490836@1721825639.990010:vmstate_downtime_load type=non-iterable idstr=kvmclock instance_id=0 downtime=2
490836@1721825639.991161:vmstate_downtime_load type=non-iterable idstr=0000:00:00.0/I440FX instance_id=0 downtime=1109
490836@1721825639.991258:vmstate_downtime_load type=non-iterable idstr=PCIHost instance_id=0 downtime=1
490836@1721825639.991321:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=0 downtime=1
490836@1721825639.991405:vmstate_downtime_load type=non-iterable idstr=fw_cfg instance_id=0 downtime=8
490836@1721825639.991472:vmstate_downtime_load type=non-iterable idstr=dma instance_id=0 downtime=3
490836@1721825639.991535:vmstate_downtime_load type=non-iterable idstr=dma instance_id=1 downtime=2
490836@1721825639.991606:vmstate_downtime_load type=non-iterable idstr=mc146818rtc instance_id=0 downtime=3
490836@1721825639.992019:vmstate_downtime_load type=non-iterable idstr=0000:00:01.1/ide instance_id=0 downtime=352
490836@1721825639.992095:vmstate_downtime_load type=non-iterable idstr=i2c_bus instance_id=0 downtime=1
490836@1721825639.992733:vmstate_downtime_load type=non-iterable idstr=0000:00:01.3/piix4_pm instance_id=0 downtime=575
490836@1721825639.992822:vmstate_downtime_load type=non-iterable idstr=0000:00:01.0/PIIX3 instance_id=0 downtime=2
490836@1721825639.992917:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=0 downtime=31
490836@1721825639.993015:vmstate_downtime_load type=non-iterable idstr=i8259 instance_id=1 downtime=29
490836@1721825639.993108:vmstate_downtime_load type=non-iterable idstr=ioapic instance_id=0 downtime=34
490836@1721825639.993172:vmstate_downtime_load type=non-iterable idstr=hpet instance_id=0 downtime=3
490836@1721825639.993236:vmstate_downtime_load type=non-iterable idstr=i8254 instance_id=0 downtime=2
490836@1721825639.993301:vmstate_downtime_load type=non-iterable idstr=pcspk instance_id=0 downtime=1
490836@1721825639.993372:vmstate_downtime_load type=non-iterable idstr=fdc instance_id=0 downtime=7
490836@1721825639.993437:vmstate_downtime_load type=non-iterable idstr=ps2kbd instance_id=0 downtime=2
490836@1721825639.993500:vmstate_downtime_load type=non-iterable idstr=ps2mouse instance_id=0 downtime=1
490836@1721825639.993566:vmstate_downtime_load type=non-iterable idstr=pckbd instance_id=0 downtime=2
490836@1721825639.993642:vmstate_downtime_load type=non-iterable idstr=vmmouse instance_id=0 downtime=14
490836@1721825639.993709:vmstate_downtime_load type=non-iterable idstr=port92 instance_id=0 downtime=0
490836@1721825639.993769:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=1 downtime=0
490836@1721825639.993830:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=2 downtime=1
490836@1721825639.993889:vmstate_downtime_load type=non-iterable idstr=PCIBUS instance_id=3 downtime=0
490836@1721825639.994252:vmstate_downtime_load type=non-iterable idstr=vmgenid instance_id=0 downtime=303
490836@1721825639.994718:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/uhci instance_id=0 downtime=387
490836@1721825639.994803:vmstate_downtime_load type=non-iterable idstr=0000:00:01.2/1/usb-ptr instance_id=0 downtime=7
490836@1721825639.996002:vmstate_downtime_load type=non-iterable idstr=0000:00:02.0/vga instance_id=0 downtime=1132
490836@1721825639.997270:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/virtio-scsi instance_id=0 downtime=1191
490836@1721825639.997342:vmstate_downtime_load type=non-iterable idstr=0000:00:05.0:01.0/0:0:0/scsi-disk instance_id=0 downtime=2
490836@1721825639.998352:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
490836@1721825639.999699:vmstate_downtime_load type=non-iterable idstr=0000:00:12.0/virtio-net instance_id=0 downtime=2286
490836@1721825640.000402:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
490836@1721825650.014255:vmstate_downtime_load type=non-iterable idstr=0000:00:13.0/virtio-net instance_id=0 downtime=10014486
490836@1721825650.014355:vmstate_downtime_load type=non-iterable idstr=acpi_build instance_id=0 downtime=0
490836@1721825650.014397:vmstate_downtime_load type=non-iterable idstr=globalstate instance_id=0 downtime=2
490836@1721825650.014975:vmstate_downtime_checkpoint dst-precopy-loadvm-completed
490836@1721825650.015098:vmstate_downtime_checkpoint dst-precopy-bh-enter
490836@1721825660.509905:qemu_announce_timer_del free named: 0 free timer: 0 id: (null)
490836@1721825660.509993:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
490836@1721825660.510051:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
490836@1721825660.510093:vmstate_downtime_checkpoint dst-precopy-bh-announced
490836@1721825660.510134:vmstate_downtime_checkpoint dst-precopy-bh-vm-started
490836@1721825660.559596:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
490836@1721825660.559672:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
490836@1721825660.709633:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
490836@1721825660.709744:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
490836@1721825660.959711:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
490836@1721825660.959855:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
490836@1721825661.309616:qemu_announce_self_iter _:net0:bc:24:11:fb:d5:fe skip: 0
490836@1721825661.309735:qemu_announce_self_iter _:net1:bc:24:11:f8:88:b3 skip: 0
490836@1721825661.309796:qemu_announce_timer_del free named: 1 free timer: 0 id: (null)

Code:
2024-07-24 07:53:58 migration active, transferred 487.8 MiB of 2.0 GiB VM-state, 62.5 MiB/s
2024-07-24 07:54:10 average migration speed: 103.3 MiB/s - downtime 10037 ms
2024-07-24 07:54:10 migration status: completed
2024-07-24 07:54:15 ERROR: tunnel replied 'ERR: resume failed - VM 106 qmp command 'query-status' failed - got timeout' to command 'resume 106'
2024-07-24 07:54:17 ERROR: migration finished with problems (duration 00:00:40)
 
So the culprit is:
Code:
07:54:10 pread64(38, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096 10.494424
This read call starts exactly when the migration finishes and takes a bit over 10 seconds.

Looking at my own strace and the QEMU source code, the read happens when probing the request alignment for the block device.

But it's only a 4KiB read, so my best guess is that it takes so long, because it is the initial read to the device on the new node.

I sent a patch to bump the timeout. You'd need to do kernel+Ceph+network analysis to find out why the initial read takes so long, but I'm not sure there's necessarily a bug here
 
  • Like
Reactions: justinclift
So the culprit is:
Code:
07:54:10 pread64(38, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096 10.494424
This read call starts exactly when the migration finishes and takes a bit over 10 seconds.

Looking at my own strace and the QEMU source code, the read happens when probing the request alignment for the block device.

But it's only a 4KiB read, so my best guess is that it takes so long, because it is the initial read to the device on the new node.

I sent a patch to bump the timeout. You'd need to do kernel+Ceph+network analysis to find out why the initial read takes so long, but I'm not sure there's necessarily a bug here
 
Increasing the timeout will help with preventing the migration erroring and being in a paused state, however; it comes at the cost of increased downtime during the migration compared to librbd. Essentially going from sub second failover to 8-10 seconds of failover.

I don't know enough about it to claim there is or isn't a bug. I am curious on why librbd does not seem to have this problem but krbd does "in our environment". Good news is that it's easy enough to just uncheck krbd as long as it continues to work better that way.
 
Last edited:
Don't suppose there's a DNS or other name lookup service config problem, whereby it's failing some part of a hostname lookup then it works when falling back to a secondary hostname lookup service?
 
Increasing the timeout will help with preventing the migration erroring and being in a paused state, however; it comes at the cost of increased downtime during the migration compared to librbd. Essentially going from sub second failover to 8-10 seconds of failover.
No, there is no increased downtime. In environments where the guest does not take 10 seconds for the initial read, it will still respond as quickly as before. In environments where the QMP command previously timed out, the downtime was until you manually resumed it, now it is much less.

I don't know enough about it to claim there is or isn't a bug. I am curious on why librbd does not seem to have this problem but krbd does "in our environment". Good news is that it's easy enough to just uncheck krbd as long as it continues to work better that way.
To my knowledge, nobody else has reported this issue. Of course it can be a bug, but it can also be something specific to your environment or both.
 
Don't suppose there's a DNS or other name lookup service config problem, whereby it's failing some part of a hostname lookup then it works when falling back to a secondary hostname lookup service?

We didn't use dns names when configuring the storage connection to the ceph cluster. I have checked to make sure there were no noticeable connectivity issues between any of the proxmox nodes to any of the ceph nodes. All the ceph nodes are nvme and running 40g connectivity. We have other hypervisors and ceph clients with no noticeable connectivity issues with the ceph cluster. Additionally librbd does not seem to have an issue with connectivity to the ceph cluster for some reason. I don't know enough about the differences between those two drivers to speak to why that would be. I'm sure there is an explanation.

Regardless; I'm going to take a second look just to be sure. Great suggestion.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!