Failed Unplanned Migration - LXC, LVM's, Storage, and Errors oh my!

Efflixi

New Member
Apr 29, 2020
7
2
3
54
Bear with me as I do my best to explain the situation.

We have ProxMox setup on 5 physical machines hosting over 20 VMs. Several weeks ago we lost power to the building and unbeknownst to us, one of the physical servers (a node in ProxMox) apparently had a failed migration of a VM that was configured wrongly for High Availability. It took the LVM drive for the VM from the original node and put it on a random other node but failed to migrate the VM itself. Or at least that's the best I can guess as to what happened.

All of that's in the past now. What I need now is for this LVM storage to move back to the original node. The VM is still on the original node but can't function obviously without it's storage. Our storage isn't shared at the datacenter level and we have dozens of terabytes of critical business data that we can't legally lose otherwise we will get sued into oblivion so I am extremely hesitant to turn it on on a live system. I have tried everything I know of and found on the internet to move this and it's just not working.

Quick rundown:
Node 3 = original host of VM (vm 116 in our case) - NOTE vm 116 is a linux container, not an actual virtual machine, i have no hardware tab
Node 5 = where the failed migration tried to go and where the disk storage for vm 116 currently is located, I need to get this back to Node 3 somehow
 
Hi,
you can either:
  • Re-create the LVM drive on node 3 (make sure it has the name the config expects: the default is vm-116-disk-0) and copy over the data (with dd over ssh for example). Which is essentially what happens if you run the following on node 5:
    Code:
    pvesm export <STORAGEID1>:vm-116-disk-<N> raw+size - | ssh root@<IP of NODE3> pvesm import <STORAGEID2>:vm-116-disk-<N> raw+size -
    This will copy the disk from STORAGEID1 on node 5 to STORAGEID2 on node 3.
or
  • Move Copy the configuration file from /etc/pve/nodes/<NODE3>/lxc/116.conf to /etc/pve/nodes/<NODE5>/lxc/116.conf and migrate the container back afterwards.

In both cases, you may also need to issue a pct unlock 116.

EDIT: There shouldn't be two configuration files for the same ID, sorry for writing "Copy" before.
 
Last edited:
I believe Fabian's comments nail it very accurately! My only small addendum, you will want to de-configure the HA related to this VM, I believe, so it won't get involved. Copying/moving the 116.conf file to the 'correct place where the VM Disk actually exists' is probably the simplest path to an operational happy endpoint. Just be careful with your work and it should be very doable. You can assess what is inside the LVM storage on a given host/storage with suitable LVM commands in the CLI if you wish.

Tim
 
I actually ended up fixing this myself before either of these replies were made. I'll give my answer to my issue because copying the config and migrating the LXC was the first thing I tried weeks ago when this issue first happened. It doesn't work because the storage on Node 5 was "stale".

Situation before I fixed it:
Node 3 - Has the LXC and the 116.conf but no storage LVM anymore and can't migrate
Node 5 - Has the 116.conf AND the storage LVM but I can't migrate the LXC itself over to this due to errors on Node 3

My fixes:
1. Comment storage out in Node 3 116.conf
2. Comment storage out in Node 5 116.conf
3. Migrate LXC from Node 3 to Node 5 (this worked this time)
4. Uncomment storage on Node 5 116.conf (this no longer exists on Node 3)
5. Start LXC
6. SUCCESS!

If anyone else has issues hopefully this helps! The thing preventing this from working on all my previous attempts was not commenting out the LVM storage on BOTH nodes at the same time.
 
Glad you were able to solve it. I just wanted to add that PVE doesn't really expect a configuration file for the same VM/LXC ID to be present on two nodes at the same time. If there is no volume attached (i.e. commented out in your case), migrating essentially boils down to moving the configuration file to /etc/pve/nodes/<migration target>/lxc/116.conf and so in your step 3, it just overwrote the existing configuration file on node 5 AFAICT.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!