Migrating from one Node Proxmox to clustered Proxmox

jstrauss

New Member
Apr 27, 2023
8
2
3
Hello,

i am planning to move my one node Proxmox server with VMs to a new 3 node Cluster with Ceph Pool.
Now are my questions:

1. Whats the best way to move the VMs to the according Ceph Pools on the Cluster, with minimal downtime.
2. Is there a way for testing to just copy a VM from the active server to the Cluster without downtime?

Thank you for advice.
 
You are talking about minimal downtime and without downtime ... are you really ready for ceph ?
Set up your new cluster, cp over snapshot'ed vm's, rename them (maybe change static IP's and hardcore stress test hard your cluster until you got errors.
Become familar with ceph olve the errors at your own. Search in this forum for ceph ..., read lots of them, try to re-test the problems in your cluster and be sure you are able to solve most of them too. After that (few month) think about your migration of single pve to your pve-ceph cluster again ... and still after all your testing you are able to answer your questions by yourself.
 
  • Like
Reactions: Johannes S
a new 3 node Cluster with Ceph Pool.
@waltar sounds a little bit pessimistic. ;-)

Ceph is great, but it needs some resources above the theoretical minimum to work reliable. I would like to add this:

You plan to have three nodes. That is the absolute minimum for a cluster. Probably Ceph works with the default settings "size=3/min_size=2". (*Never* go below that!)

The first problem: if each node has only one single OSD: when (not: if) one device or one node fails Ceph is immediately degraded. There is no room for Ceph to heal itself, so "degraded" is permanent. For a stable situation you really want to have nodes that can jump in and return to a stable condition - automatically. For Ceph in this picture this means to have at least *four* nodes. (In this specific aspect; in other regards you really want to have five or more...)

Another detail often forgotten: let's say you have those three nodes with two OSD each. When one OSD fails its direct neighbor will need to take over the data from the dead disk. (That lost data can not given to another node - the only two other nodes already have a copy!) This means you can fill all OSD in this picture only up to 45 percent: the original 45% plus the "other" 45% gets you 90% on this surviving OSD. To avoid this problem you want several OSDs per node or - better! - more than three nodes.


Note that Ceph is more critical for a cluster than a local SSD is for one of the three nodes: when Ceph goes readonly *all VMs in the whole cluster* will stop immediately - they can not write any data (including log messages, which is practically *always* done) and will stall.

Network: note that data-to-be-written will go over the wire multiple times before it is considered "written". A fast network is a must. This means 10 GBit/s should be considered the minimum. But yeah, technically it works with slower speeds. At first, with low load. When high usage leads to congestion it will increase the latency and you will encounter "strange" errors, which may be hard to debug.

Regarding SSDs/NVMe: you probably already know the recommendation regarding Enterprise class devices. Those have reasons (plural), please consider them. (If you go for -let's say- seven nodes with five OSD each the quality of the OSDs (in a homelab) may be lower, but with a bare minimum number of disks they really need to be high quality.)


YMMV! And while I am experimenting with Ceph I am not an expert...
 
No, I don't mean pessimistic. I just want to prepare for ceph as it's not only a install and then forget.
And yeah...
Note that Ceph is more critical for a cluster than a local SSD is for one of the three nodes: when Ceph goes readonly *all VMs in the whole cluster* will stop immediately - they can not write any data (including log messages, which is practically *always* done) and will stall.
 
  • Like
Reactions: Johannes S and UdoB
Yes i understand that CEPH is not plug and play and doing everything automatically without consent and further investigation and effort.

I will use the cluster for production and therefore i use it now for some months in homelab for testing and getting closer to CEPH.

My current setup is:
2x 980GB RAID-1 SSD-Drives for Proxmox
4x 10TB HDD for CEPH Pool with WAL/DB on a Enterprise NVMe (2 osds on 1 TB each)
2x 2TB Enterprise NVMe on seperate NVMe CEPH Pool

and this 3x everything with enterprise grade hardware connected with 10Gbit/s DAC in a ring-net and everyone connected with 1Gbit/s to "public net" and IPMI with separated Connection.

I spent some time in googling and forum to get an idea how to build my cluster, and i think this is pretty a valuable and solid setup. I also have spare drives, so if something fails i can relatively replace them directly.

Node with 4 or more Nodes will be come in question in some years, but firstly it's better than 1 Node with only RAIDZ1, or equivalent to 3 nodes in ceph, but with the benefit of 1 node can die itself and no data is directly gone and everything is mostly working fine. Also it's important for me to maintain or upgrade the server without downtime.

And for this to my questions:
1. How can i copy one of my currently prod vms to my homelab without getting the prod VM beeing stopped to test the CEPH performance with my currently prod VMs on my cluster.
2. When I'm ready to move the 1 node server to my cluster, what's the best way, with minimal downtime? Should i add the 1 node server to the cluster and migrate the vms to the cluster and specifiy the new cephpool as target pool? Or is there a better way to do this.

(Sorry for my english)

THanks
 
  • Like
Reactions: UdoB
Should i add the 1 node server to the cluster and migrate the vms to the cluster
Never add a node with VM/CTs to a cluster: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_join_node_to_cluster
1. How can i copy one of my currently prod vms to my homelab without getting the prod VM beeing stopped to test the CEPH performance with my currently prod VMs on my cluster.
Why not restore an already existing backup of that VM on the cluster? Then you don't have to touch production at all. Just make sure your test cluster does not share the network with production (as to prevent interference while trying things out).
Node with 4 or more Nodes will be come in question in some years, but firstly it's better than 1 Node with only RAIDZ1, or equivalent to 3 nodes in ceph, but with the benefit of 1 node can die itself and no data is directly gone and everything is mostly working fine. Also it's important for me to maintain or upgrade the server without downtime.
Your new cluster is three times more likely to have a failed node than your current production setup with a single node. And Ceph won't be "mostly working fine" with just two nodes. Best to have a spare node (or two), which you might as well put in the cluster right away.
What will you do when one node fails and a new one is "some years" away? You say you want redundancy, but you're going for the bare minimum instead. I fear you'll get into trouble quickly with this approach.
 
Last edited:
  • Like
Reactions: Johannes S and UdoB
And have a doku what's to do for different failure states for another one be familar as of you as it's possible you aren't available always as like on holiday in other country or take unintentionally stay in hospital for a couple of weeks. That's real life as just a disk died in our pve cluster over these cristmas days and new acquisition will take some days in january ...
 
Why not restore an already existing backup of that VM on the cluster? Then you don't have to touch production at all. Just make sure your test cluster does not share the network with production (as to prevent interference while trying things out).
Yes i will try this. It's an good idea, which I haven't thought of...

Your new cluster is three times more likely to have a failed node than your current production setup with a single node. And Ceph won't be "mostly working fine" with just two nodes. Best to have a spare node (or two), which you might as well put in the cluster right away.
What will you do when one node fails and a new one is "some years" away? You say you want redundancy, but you're going for the bare minimum instead. I fear you'll get into trouble quickly with this approach.
Yes logically viewed, it's right. But on my single node server, if this server fails everything is down... on the cluster, when 1 node fails... it's not the greatest, but the servers are still online and I am able to resolve the issue and replace a node or drives and so on in some shorter time. And the odds really happening is very low on enterprise hardware. (Even it's possible, but really....). I maybe think about a fourth node. Or even join the cluster of a friend and mine together (because he also wants to but his cluster into the same rack in the dc). So then we have 6 nodes and can share our infrastructure and computing power.

But do you think running prod on 3 node cluster is really that bad? It's way better than a single server, has redundancy on server-level and not only on drive-only... I try to run some prod vms on the cluster and play around with the setup and look how well it's doing when a node and drives are failing... And maybe in 1-2 months i will get another node or put it straight into dc.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!