online migration never finishes, takes vm offline

athompso

Renowned Member
Sep 13, 2013
129
8
83
I've got a PVE 3.1 cluster up and running now, using sheepdog for shared storage. (I've tried on five separate occasions, I have yet to successfully build a CEPH cluster so I just gave up. The whole point, for me, is to run storage and VM on the same nodes!)

Everything's updated to v3.1-24/060bd5a6 from the no-subscription repo, as I only added the enterprise repo to these servers today.

I have a VM running happily on node#1, backed from Sheepdog storage. When I attempt to "online" migrate it to node#2, the migration starts, but apparently never finishes. The VM only has 512MB of RAM, but the migration has now been running for over 75 minutes. Each node has a 4-way LAG to a common switch; manual tests show that SCP between these nodes gets at least 20MB/sec using default ciphers, and ~75MB/sec using arcfour.

The "qm ... mtunnel" process is still running, and the SSH connection between the two nodes is still pumping a goodly amount of data over 75min later - what on earth is it transferring?

The VM, incidentally, is NOT responding on the network; the "online" migration has become an "offline" migration :-(.

The task log only shows this, and nothing else:
Dec 29 14:44:32 starting migration of VM 108 to node 'pve02' (192.168.160.28)
Dec 29 14:44:32 copying disk images
Dec 29 14:44:32 starting VM 108 on remote node 'pve02'
Dec 29 14:44:34 starting ssh migration tunnel
Dec 29 14:44:35 starting online/live migration on localhost:60000
Dec 29 14:44:35 migrate_set_speed: 8589934592
Dec 29 14:44:35 migrate_set_downtime: 0.1

How do I troubleshoot this migration process?

Thanks,
-Adam
 
Clicking "Stop" to cancel the migration produces no additional log output, but the status changes to "stopped: unexpected status".

Even better, all other attempts to control the VM on the original node now report "Error: VM is locked (migrate)".
 
Last edited:
You can unlock the vm from the commabnd line with

Code:
qm unlock <VM Id>

e.g

Code:
qm unlock 108


No idea whats happening with your migration unfortunately :( do live snapshots work?
 
I actually had to reboot Node#1 to clear the wedged KVM process; qm commands would all fail with some error about being unable to connect. I would up editing the vm config file and deleting "Locked: migration". (Going from memory, not necessarily an exact quote.)
 
Sheepdog is not considered stable, so I would not use that in a production environment.

Yes, I know.

I did not ask for a solution to my migration issue, I asked for suggestions on how to troubleshoot. I am not a QEMU expert, so I am not sure what to look at next, or even how to obtain debug logs.
I already know how to get this information under VMware, but I did not choose VMware for this project... so now I need to find out how to troubleshoot live migration with QEMU/KVM.
Links and pointers to documentation are welcome, as I do not see anything relevant on the PVE wiki.
 
Last edited:
  • Like
Reactions: Sven Jörns
Yes, I know.

I did not ask for a solution to my migration issue, I asked for suggestions on how to troubleshoot. I am not a QEMU expert, so I am not sure what to look at next, or even how to obtain debug logs.
I already know how to get this information under VMware, but I did not choose VMware for this project... so now I need to find out how to troubleshoot live migration with QEMU/KVM.
Links and pointers to documentation are welcome, as I do not see anything relevant on the PVE wiki.


Hi, you can try to disable ssh tunnel:

You can enable this by adding:
migration_unsecure: 1
to datacenter.cfg


I'm not sure that your problem is related to sheepdog, I never have had problem with it for live migration.
But if your vm is doing a lot of memory changes (like a database), your transfert speed need to be as fast as memory changes.

(qemu 1.7 have a new option to slowdown vcpu in guest for this kind of situation, I'll try to add it soon in proxmox)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!