qm migrate offline migration

rcd

Active Member
Jul 12, 2019
242
20
38
62
I have a bastard cluster - as in two identical servers, each with it's own disks i.e. not shared disks - and I need to move one 400 GB VM from one server to the other.

Since they don't share disks I suppose I need to do it as an offline migration. Not a problem per se, but I need to know what precisely happens. Will the server shut down during the transfer, leaving a long downtime, or will it do like a snapshot and copy that first and then sync up before switching, or what exactly will happen?

I've checked all the documentation I can find as well as Googled the general internet but not been able to find this explained anywhere.

Both servers use ZFS and have plenty of free space. PVE 6.4

Code:
# pvecm status
Cluster information
-------------------
Name:             EU-CL-01
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jul 14 20:36:19 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.176
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 xxx.yyy.19.102
0x00000002          1 xxx.yyy.19.108 (local)
 
Last edited:
it's possible to do a live migration with local disks as well (it will migrate the in-use disks with block-mirror over NBD)
 
Ok, that was quite painless, thanks.

So now I have two nodes:
Code:
# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 server39
         2          1 server40 (local)

I now need to stop the nodeid 1 server and remove it from the cluster, in other words destroy the cluster. Nodeid 2 should continue to run as a standalong hypervisor. According to https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node I should just shut it down, then pvecm delnode 1

What worries me a bit is the comment

As said above, it is critical to power off the node before removal, and make sure that it will never power on again (in the existing cluster network) as it is. If you power on the node as it is, your cluster will be screwed up and it could be difficult to restore a clean cluster state.

Should I stop pve-cluster, corosync - anyting else - and leave it at that or what else do I need to do? The documentation isn't completely clear.
 
you probably rather want to follow the next section "5.5.1. Separate A Node Without Reinstalling" to split both nodes (follow the instructions on both nodes), then you can do whatever you want with server39 and keep server40 as standalone node. following your linked instructions should also work, but you likely need to set pvecm expected 1 after powering off server39, since your cluster will lose quorum at that point. if you go down that route, you will end up with server40 as single-node cluster, not standalone node. in practice, there's not much difference though ;)

the "warning" is there because if you do
- power off server 39
- pvecm delnode server39 on server40
- power on server39

then server39 and server40 will share a corosync key and thus be able to "talk", but have different views of the cluster topology which can cause all sorts of issues.
 
There is a problem with the procedure explained: Efter deleting /etc/pve/corosync.conf any pvecm command - including pvecm delnode and pvecm expected - all reports Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

1626861020346.png
 
Last edited:
yeah that's true - those two commands are not needed if you split a two-node cluster into to standalone nodes (since both "remaining nodes" are no longer a cluster either in this case). you can just continue with the next steps.
 
this part in case you haven't done it already:

Now switch back to the separated node, here delete all remaining files left from the old cluster. This ensures that the node can be added to another cluster again without problems.

rm /var/lib/corosync/*

As the configuration files from the other nodes are still in the cluster filesystem you may want to clean those up too. Remove simply the whole directory recursive from /etc/pve/nodes/NODENAME, but check three times that you used the correct one before deleting it.
The nodes SSH keys are still in the authorized_key file, this means the nodes can still connect to each other with public key authentication. This should be fixed by removing the respective keys from the /etc/pve/priv/authorized_keys file.
 
That was another thing I found wasn't really clear. What NODENAME do you need to remove. The NODENAME of the server you are working on or the other? Or both?
 
always the other one(s) - you want to keep the directory of the node you are on ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!