qm migrate offline migration

rcd · Jul 14, 2021

I have a bastard cluster - as in two identical servers, each with it's own disks i.e. not shared disks - and I need to move one 400 GB VM from one server to the other.

Since they don't share disks I suppose I need to do it as an offline migration. Not a problem per se, but I need to know what precisely happens. Will the server shut down during the transfer, leaving a long downtime, or will it do like a snapshot and copy that first and then sync up before switching, or what exactly will happen?

I've checked all the documentation I can find as well as Googled the general internet but not been able to find this explained anywhere.

Both servers use ZFS and have plenty of free space. PVE 6.4

Code:

# pvecm status
Cluster information
-------------------
Name:             EU-CL-01
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jul 14 20:36:19 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.176
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 xxx.yyy.19.102
0x00000002          1 xxx.yyy.19.108 (local)

fabian · Jul 15, 2021

it's possible to do a live migration with local disks as well (it will migrate the in-use disks with block-mirror over NBD)

rcd · Jul 18, 2021

Ok, that was quite painless, thanks.

So now I have two nodes:

Code:

# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 server39
         2          1 server40 (local)

I now need to stop the nodeid 1 server and remove it from the cluster, in other words destroy the cluster. Nodeid 2 should continue to run as a standalong hypervisor. According to https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node I should just shut it down, then pvecm delnode 1

What worries me a bit is the comment

As said above, it is critical to power off the node before removal, and make sure that it will never power on again (in the existing cluster network) as it is. If you power on the node as it is, your cluster will be screwed up and it could be difficult to restore a clean cluster state.

Should I stop pve-cluster, corosync - anyting else - and leave it at that or what else do I need to do? The documentation isn't completely clear.

fabian · Jul 19, 2021

you probably rather want to follow the next section "5.5.1. Separate A Node Without Reinstalling" to split both nodes (follow the instructions on both nodes), then you can do whatever you want with server39 and keep server40 as standalone node. following your linked instructions should also work, but you likely need to set pvecm expected 1 after powering off server39, since your cluster will lose quorum at that point. if you go down that route, you will end up with server40 as single-node cluster, not standalone node. in practice, there's not much difference though

the "warning" is there because if you do
- power off server 39
- pvecm delnode server39 on server40
- power on server39

then server39 and server40 will share a corosync key and thus be able to "talk", but have different views of the cluster topology which can cause all sorts of issues.

rcd · Jul 21, 2021

There is a problem with the procedure explained: Efter deleting /etc/pve/corosync.conf any pvecm command - including pvecm delnode and pvecm expected - all reports Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

fabian · Jul 21, 2021

yeah that's true - those two commands are not needed if you split a two-node cluster into to standalone nodes (since both "remaining nodes" are no longer a cluster either in this case). you can just continue with the next steps.

rcd · Jul 21, 2021

Right, but there are no more steps then?

fabian · Jul 21, 2021

this part in case you haven't done it already:

Now switch back to the separated node, here delete all remaining files left from the old cluster. This ensures that the node can be added to another cluster again without problems.

rm /var/lib/corosync/*

As the configuration files from the other nodes are still in the cluster filesystem you may want to clean those up too. Remove simply the whole directory recursive from /etc/pve/nodes/NODENAME, but check three times that you used the correct one before deleting it.

The nodes SSH keys are still in the authorized_key file, this means the nodes can still connect to each other with public key authentication. This should be fixed by removing the respective keys from the /etc/pve/priv/authorized_keys file.

rcd · Jul 21, 2021

That was another thing I found wasn't really clear. What NODENAME do you need to remove. The NODENAME of the server you are working on or the other? Or both?

fabian · Jul 21, 2021

always the other one(s) - you want to keep the directory of the node you are on

rcd · Jul 21, 2021

Right, makes sense. Thanks.

Search

Search

qm migrate offline migration

rcd

Active Member

fabian

Proxmox Staff Member

rcd

Active Member

fabian

Proxmox Staff Member

rcd

Active Member

fabian

Proxmox Staff Member

rcd

Active Member

fabian

Proxmox Staff Member

rcd

Active Member

fabian

Proxmox Staff Member

rcd

Active Member