Migrating from Xen to Proxmox/ceph incrementally

chrispage1

Active Member
Sep 1, 2021
91
50
38
33
Hi,

Firstly I'd like to say hello - I'm new to Proxmox and so far, very impressed! I've got a single node test setup that I've been having a play with.

We're looking to migrate our three-node Xen cluster to Proxmox using Ceph. I've got a test setup here which is just a single node. What we're looking to do is gradually migrate each VM from Xen over to Proxmox, using the same physical hardware.

My plan to do this was to move everything off of node1, shut it down, install Proxmox, configure Ceph, migrate a few VMs over and repeat for node2 & node3 until the process is completed.

The query I have is that node1 will initially be running standalone (particularly Ceph). I'm having problems getting Ceph configured on a single node. From the offset I get the below -

1630577347607.png

Additionally, how easy would it be to bring in new servers into the Ceph cluster?

Thanks,
Chris.
 

I probably should have Google'd a little more as I would have undoubtedly come across that so apologies!

I'm actually testing something now off the back of a Stackoverflow answer (https://stackoverflow.com/a/66362327/504487). I've been able to create a pool that uses OSD replication rather than host replication. So I'm able to move across a number of VMs to a single node running Proxmox & Ceph, then introduce a second Proxmox node. I'm just investigating as to whether I'll be able to then edit the pool crush rule and it begins to sync across the nodes rather than OSD's.

I'll post my findings back here. Thanks!
 
So initially with the single node, I created my Ceph Pool by using an OSD based crush rule.

1630662333757.png

replicated_rule_osd was created by logging into the PVE & running ceph osd crush rule create-replicated replicated_rule_osd default osd (thanks to zamnuts on Stackoverflow - https://stackoverflow.com/a/66362327/504487). This new crush rule replicates the data across OSD's, rather than hosts. So it's still important that you have a relative number of OSD's on your initial node.

Once you've introduced other PVE nodes into your cluster, you can amend the crush rule being used to begin replicating across hosts (using the replicated_rule).

So my migration strategy is as below -

  1. Migrate running VM's from xen3 to xen2.
  2. Shut down xen3 and reinstall with Proxmox (proxmox1)
  3. Configure Promox, setup cluster, install Ceph and create a pool that uses the replicated_rule_osd rule
  4. Migrate VM's that were originally on xen3 to promox1

  5. Migrate running VM's from xen2 to xen1
  6. Shut down xen2 and reinstall with Proxmox (proxmox2)
  7. Add proxmox2 to the Proxmox cluster and configure Ceph - all configuration should be adopted from proxmox1

  8. Configure a second Ceph manager on your node proxmox2. This will allow for active/standby redundancy for Ceph.

  9. Alter the Ceph pool to use the replicated_rule crush rule. Change size to 2.
    This begins syncing the data that is currently sat on proxmox1 over to proxmox2. I'd recommend waiting for this to complete before carrying on to any next steps.

  10. Migrate VM's from xen1 to your now configured Proxmox cluster
  11. Shut down xen1 and reinstall with Proxmox (proxmox3)
  12. Add proxmox3 to the Proxmox cluster and configure Ceph - all configuration should be adopted from proxmox1

  13. If you'd rather have N+2 redundancy, update the Ceph pool again to size 3. This will rebuild the pool across all three nodes.

Whenever performing the above actions, always wait for the pool to rebuild. You will see degradation errors as Ceph has suddenly gone from comfortable replication to having to replicate everything across to nodes 2 & 3. Feel free to point out any flaws in this plan and anyone who is carrying this out should do so with extreme caution!