I've got a PVE 3.1 cluster up and running now, using sheepdog for shared storage. (I've tried on five separate occasions, I have yet to successfully build a CEPH cluster so I just gave up. The whole point, for me, is to run storage and VM on the same nodes!)
Everything's updated to v3.1-24/060bd5a6 from the no-subscription repo, as I only added the enterprise repo to these servers today.
I have a VM running happily on node#1, backed from Sheepdog storage. When I attempt to "online" migrate it to node#2, the migration starts, but apparently never finishes. The VM only has 512MB of RAM, but the migration has now been running for over 75 minutes. Each node has a 4-way LAG to a common switch; manual tests show that SCP between these nodes gets at least 20MB/sec using default ciphers, and ~75MB/sec using arcfour.
The "qm ... mtunnel" process is still running, and the SSH connection between the two nodes is still pumping a goodly amount of data over 75min later - what on earth is it transferring?
The VM, incidentally, is NOT responding on the network; the "online" migration has become an "offline" migration :-(.
The task log only shows this, and nothing else:
How do I troubleshoot this migration process?
Thanks,
-Adam
Everything's updated to v3.1-24/060bd5a6 from the no-subscription repo, as I only added the enterprise repo to these servers today.
I have a VM running happily on node#1, backed from Sheepdog storage. When I attempt to "online" migrate it to node#2, the migration starts, but apparently never finishes. The VM only has 512MB of RAM, but the migration has now been running for over 75 minutes. Each node has a 4-way LAG to a common switch; manual tests show that SCP between these nodes gets at least 20MB/sec using default ciphers, and ~75MB/sec using arcfour.
The "qm ... mtunnel" process is still running, and the SSH connection between the two nodes is still pumping a goodly amount of data over 75min later - what on earth is it transferring?
The VM, incidentally, is NOT responding on the network; the "online" migration has become an "offline" migration :-(.
The task log only shows this, and nothing else:
Dec 29 14:44:32 starting migration of VM 108 to node 'pve02' (192.168.160.28)
Dec 29 14:44:32 copying disk images
Dec 29 14:44:32 starting VM 108 on remote node 'pve02'
Dec 29 14:44:34 starting ssh migration tunnel
Dec 29 14:44:35 starting online/live migration on localhost:60000
Dec 29 14:44:35 migrate_set_speed: 8589934592
Dec 29 14:44:35 migrate_set_downtime: 0.1
How do I troubleshoot this migration process?
Thanks,
-Adam