Successful but not easy live migration story

f4242

Well-Known Member
Dec 19, 2016
101
4
58
Quebec, QC
Hello,

I have a failing hard drive on one of our nodes. So I had to migrate my running VMs to another node. I succeeded but it was not as easy as I thought.

A little summary of the setup of that server. Proxmox 5.1 is installed on an hard drive. The pve volume group is on that drive. I have pve/root and pve/data logical volume. pve-data volume is not currently used, but it could if I would need bulk space for a VM. I also have a second drive, a SSD. I created the volume group pve-ssd on that drive. I have the pve-ssd/data logical volume. My VMs are running from that volume. The failing disk is the hard drive.

The plan was to setup a temporary server, migrate the VM on it, shutdown the current server, remove it from the cluster, replace the hard drive, install proxmox on the new hard drive, make the server join the cluster again and migrate back the VMs.

First try. I try to migrate one VM to the temporary server. It fails immediately because the storage local-ssd-lvm is not found on the remote server. Really? Why it assumes all hosts are equal and have the same storage? The web UI should ask what storage I want to use on the remote node. So I added a drive on the temporary server and configured LVM using the same naming convention I used on the original server. Then I configured Proxmox to mark the volume as available on both server.

Second try. This time it says it cannot migrate VM using local disks. It was easy to do in the command line (found another thread about that on the forum). I wonder why the web UI can't do that?

Third time, all seem good. My VMs are currently migrating to the new node thanks to the command line!
 
I try to migrate one VM to the temporary server. It fails immediately because the storage local-ssd-lvm is not found on the remote server. Really? Why it assumes all hosts are equal and have the same storage?
In the datacenter -> storage tab you can edit your storage and limit the storage to a node. The GUI reflects more the general (safer) way to do things, while on the command line you have more options available.
 
In the datacenter -> storage tab you can edit your storage and limit the storage to a node. The GUI reflects more the general (safer) way to do things, while on the command line you have more options available.

I know that, but if I want to migrate a VM from one server to another, can I do it even if the storage name is different? I can set the storage as available on the web ui, but what happen if the lvm logical volume with that name doesn't really exists on the destiation server? I assume this doesn't work, this is why I had to setup another logicial volume with the same name on my temporary server.

My point is I should be able to migrate a VM from one server to another without consideration of the storage location. The web UI should just list storage locations available on the remote server and ask user to choose one from that list.
 
when i use bulk migrate,the virtual machine easy lock ,i use "for;do qm migrate;done" command,also to;but when i use "for;do qm migrate && sleep 20;done" ,it's OK,i don't know it's a bug?
proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-10
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1
 
when i use bulk migrate,the virtual machine easy lock ,i use "for;do qm migrate;done" command,also to;but when i use "for;do qm migrate && sleep 20;done" ,it's OK,i don't know it's a bug?
Most likely not a bug, more likely some resource shortage (or not fast enough) on the nodes when you run bulk migrations.

EDIT: Also please don't double post.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!