upgrade from 8.4.1 to 9.1

Jun 25, 2022
96
9
13
hi
i am planning to upgrade a three node cluster to 9.1 from 8.4.1 , and have some question ? hope someone will answer and guide me.
  • Servers:
    3-node cluster (Dell R750), each with 8 NICs configured in a OVS bond setup (LACP). all traffic isolated through vlan, Data and cluster traffic are isolated
  • All of the three server has 2 pcs 1.6 TB each (total 6 ) NVME drive for ceph cluster, version 19.2.1
  • Full Mesh Network for Ceph Server - "broadcast" bond with the 10g interfaces on every node​

question 1- can it be possible to upgrade this to SDN Fabrics introduced in version 9.x onwards ? or keep it as it is ?

question 2:- i want to add 6 more nvme of 3.2tb each to the ceph cluster (2 nvme on each server) - what is the procedure to follow to add this disks without creating any failure, as these server is in 24/7 production use ?
and mixing two different size of disk in the ceph cluster will create any problem ?

kindly guide , help will be valuable for me to decide the next step of upgrade.
thanks
 
i find it very strange in this forum , member for past three+ years, with a paid subcription, did not get any reply from any proxmox support for my simple two questions for past 10 days, with 419 view on the topic ? should i need to upgrade my paid subcription to get support ? expect a answer from someone or i will assume community subcription will not work any more for future release of proxmox. ?????
 
Well..., I'm sorry.

The more unusual a specific setup is the lower the chance for a helpful answer. In you case I can not help because I do not use OVS, do not use bonds and do not use Full Mesh. Actually I am fairly sure the number of active users (in this forum) with a comparable configuration is... really low.

:-(
 
  • Like
Reactions: news
i want to add 6 more nvme of 3.2tb each to the ceph cluster (2 nvme on each server) - what is the procedure to follow to add this disks without creating any failure, as these server is in 24/7 production use ?
Adding OSDs should work on-the-fly. Actually Ceph's basic functionality makes sure to have no problem if one OSD dies - and it has no problem when one or more are added.

Be aware that re-balancing will introduce some - or a lot - traffic...

and mixing two different size of disk in the ceph cluster will create any problem ?
This is probably not recommended, but it is not really a no-go. Inside of one node the multiple OSDs are balanced, all of them will get filled approximately by the same percentage.

Let's say you have only two OSD in a node. (This is not recommended in itself!) Let's say these two are 2 TB and 8 TB. Both are 50% full. What happens if the larger one dies? Trouble rises because this node can not store the expected amount of data anymore.

Positive example: you have three 2 TB and add one 4 TB OSD. Each one is 50% filled. When the largest one dies there is two TB of data required to be put onto the "old" 2 TB OSDs = 666GB for each one. This fits --> no critical problem. But these are nearly full now --> action needed.

I would really recommend to have so many OSDs in each node that when the largest one fails the node can still keep the expected volume of data.

The problem of re-distributing data is especially a risk for a cluster with only three nodes. There is no spare-node for auto-healing if one node fails or is degraded as in the example above...


Disclaimer: I am not using Ceph currently...
 
Well..., I'm sorry.

The more unusual a specific setup is the lower the chance for a helpful answer. In you case I can not help because I do not use OVS, do not use bonds and do not use Full Mesh. Actually I am fairly sure the number of active users (in this forum) with a comparable configuration is... really low.

:-(
if you do not use any option/settings in proxmox , that do not mean other can not use it, if these settings are available and a documentation is available then the people behind these concepts must have answer to my question, hope some answer will be available from the support team.
 
Adding OSDs should work on-the-fly. Actually Ceph's basic functionality makes sure to have no problem if one OSD dies - and it has no problem when one or more are added.

Be aware that re-balancing will introduce some - or a lot - traffic...


This is probably not recommended, but it is not really a no-go. Inside of one node the multiple OSDs are balanced, all of them will get filled approximately by the same percentage.

Let's say you have only two OSD in a node. (This is not recommended in itself!) Let's say these two are 2 TB and 8 TB. Both are 50% full. What happens if the larger one dies? Trouble rises because this node can not store the expected amount of data anymore.

Positive example: you have three 2 TB and add one 4 TB OSD. Each one is 50% filled. When the largest one dies there is two TB of data required to be put onto the "old" 2 TB OSDs = 666GB for each one. This fits --> no critical problem. But these are nearly full now --> action needed.

I would really recommend to have so many OSDs in each node that when the largest one fails the node can still keep the expected volume of data.

The problem of re-distributing data is especially a risk for a cluster with only three nodes. There is no spare-node for auto-healing if one node fails or is degraded as in the example above...


Disclaimer: I am not using Ceph currently...
my purpose to add the extra disk to the ceph cluster is to retire the old 1.6 tb disk in near future, my workload is small, 20-25 vm, in HA active in 15 vm, rebalancing is not that big, as the total consumption at this moment is ariund 47% of osd, each cluster has local zfs mirror volume of 1tb available for emergency.this cluster is operational for past 5 years....... hope after upgrade i will be able to replace the old disk peacefully, or i will temporarily transfer the vm to local disk during rebalancing , the purpose of my post is to look for a clear guide to add those osd to the cluster and remove the old ones in future.