New Proxmox VE 5.0 with Ceph production cluster and upgrade from 3 to 5 nodes

lucaferr

Renowned Member
Jun 21, 2011
71
9
73
Hello!
I'm planning to build a new 3-node production cluster based on Proxmox VE 5.0 (as soon as the stable will be released) with Ceph storage running on the same nodes, as described in the tutorial. The 3 nodes will be identical and will have a 10 Gb internal network (for Ceph and corosync) in addition to the internet connections. After some weeks of production, I'll have 2 other servers (with different hardware, less powerful than the other three) available and I'd like to add them to the cluster to improve redundancy to be able to survive 2 dead nodes without downtime. I have a couple of questions:
  • I guess that, while running on 3 nodes, the cluster will be able to survive one node death without service interruption....and while running on 5 nodes, it'll be able to survive 2 node deaths. Is this correct? Should I remember to adjust some configurations to guarantee this?
  • Will the upgrade from 3 to 5 nodes cause downtime (or extreme slow down due to moving data among the nodes?). Is there something I can do to avoid this?
  • As I said, the 2 added nodes will have less powerful CPU and less RAM. The spinning disks will be similar (WD Red Pro, maybe a generation of 2 years ago) and the SSD for journal will be identical to the 3 first nodes (I was thinking about buying a new Intel DC P3700 PCi-E for every node). Is there any risk that they will slow down the performance of the cluster? Is there something I should care about?
Thank you to everyone that'll contribute!
 
before answering anything please note that, on pve 5.0 we will only support ceph starting with luminous which is not stable yet.
there are ceph jewel packages for stretch (maintained by debian itself) but those use a different packaging scheme and are not tested with our tools, and could make problems when one want to upgrade to luminous
(please someone correct me if something is wrong here)

  • I guess that, while running on 3 nodes, the cluster will be able to survive one node death without service interruption....and while running on 5 nodes, it'll be able to survive 2 node deaths. Is this correct? Should I remember to adjust some configurations to guarantee this?
yes for proxmox, for ceph it depends how the pgs are distribted across the hosts and which hosts fail. but since a host fail is not a common event, ceph has probably enough time to replicate the data to the remaining hosts before another one fails -> here you should have an eye on the general usage

  • Will the upgrade from 3 to 5 nodes cause downtime (or extreme slow down due to moving data among the nodes?). Is there something I can do to avoid this?
for proxmox no there will be no downtime, for ceph adding osds can introduce latency and io, best read the ceph documentation on how to properly add osds to a cluster and how to configure this

  • As I said, the 2 added nodes will have less powerful CPU and less RAM. The spinning disks will be similar (WD Red Pro, maybe a generation of 2 years ago) and the SSD for journal will be identical to the 3 first nodes (I was thinking about buying a new Intel DC P3700 PCi-E for every node). Is there any risk that they will slow down the performance of the cluster? Is there something I should care about?
ceph scales very good the more hosts it has, so it is hard to say if this case has better or worse performance than with three hosts.
generally, ceph performs better with more hosts (provided that the network is good)
 
before answering anything please note that, on pve 5.0 we will only support ceph starting with luminous which is not stable yet.
there are ceph jewel packages for stretch (maintained by debian itself) but those use a different packaging scheme and are not tested with our tools, and could make problems when one want to upgrade to luminous
Wow, this scares me a bit. Provided that I'd like to remain as "Proxmox-standard" as I can, would you recommend going for Proxmox 4.4 with its stable Ceph and upgrade to Proxmox 5 in the future (but also upgrading Proxmox + Ceph on a production cluster scares me a bit) or what else?
Thank you!
 
i would recommend the following:

if you can wait, wait for pve 5.0 and a stable luminous (the easiest and least error prone, but no fixed date yet as to when luminous will be stable exactly)

if you need it now or in the very near future, use pve 4.4 with ceph jewel
the plan is to have an upgrade path with luminous on pve 4.4 (when it is stable) and then doing the upgrade to pve 5
 
Thank you! I need the cluster to be up and running by the end of July. I saw that Ceph 12.1.0 luminous release candidate is just being released so if I'm lucky the stable could be out by that date and I hope Proxmox VE 5.0 too. :)
 
If you can wait I would. The transition from Jewel to Luminous is likely to be troublesome given that the Ceph team is doing some pretty major surgery in the OSD on-disk formatting, etc. From what I have seen/tested of the pre-release I believe it will be worth waiting for it.

As for your other questions:
  • With three nodes the "default" replication is "replica 2", which results in 3 copies of all data. In that configuration you can maintain operation with a single faulted node but you cannot re-balance to a new "stable" configuration. You can reset the replication in your pool to 1 (two copies) but at reduced redundancy. it can work, but I'd consider it somewhat fragile for production.

  • Transitioning to 5 nodes should be seamless and can be done on-line. As noted above there will be a period of added IO latency while Ceph redistributes replicas across the additional nodes but - if you are engineering things correctly - you should always allow for that extra IO load anyway as it is the same effect you will see if/when an OSD or whole host fails.
Best of luck to you.
 
i would recommend the following:

if you can wait, wait for pve 5.0 and a stable luminous (the easiest and least error prone, but no fixed date yet as to when luminous will be stable exactly)

if you need it now or in the very near future, use pve 4.4 with ceph jewel
the plan is to have an upgrade path with luminous on pve 4.4 (when it is stable) and then doing the upgrade to pve 5

Hi,

I see CEPH v12.2.0 Luminous was released on the 29 August 2017 and "v12.2.1 Luminous" was released on 28 September 2017.
Have you been able to incorporate it into Proxmox yet? Is it stable enough to use in Production?
 
Hi,

I see CEPH v12.2.0 Luminous was released on the 29 August 2017 and "v12.2.1 Luminous" was released on 28 September 2017.
Have you been able to incorporate it into Proxmox yet? Is it stable enough to use in Production?

there are v12.2.0/1 packages available in our repositories, and everything should work with current PVE packages. that being said, official support will come with PVE 5.1, which will be released soon (see https://forum.proxmox.com/threads/p...uminous-kernel-4-13-latest-zfs-lxc-2-1.36943/ )
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!