Ceph not enter to re-build or rebalance

Ayush

Member
Oct 27, 2023
70
2
8
Hello Team,

I have 3 node cluster with pve 8.1.4 and ceph quency 17.2 as shared storage . I shutdown one node and I am expecting that ceph should start rebuilding and re-balancing ceph cluster. But it didn't start. Ceph shows only warning of ods down.
When the node boot up it shows healthy again.
Is this a expected behavior. How it is different from node break down.
 
The default configuration is to write 3 replicas with 2 minimum available to continue active IO.
The default CRUSH rule says each replica has to be on another host, so a complete host can fail without impact on the data.
So if you got "only" 3 nodes there is no other osd which ceph can rebuild to as it would not comply with the crush and pool rules.

PS: Ceph waits 5 minutes for an osd that did go offline, after that it will mark it as "out". At this point the rebalancing will begin.
 
PS: Ceph waits 5 minutes for an osd that did go offline, after that it will mark it as "out". At this point the rebalancing will begin.
Sorry, the default value is 10 minutes (600 seconds). You can check the value for each OSD with:
Code:
ceph config show-with-defaults osd.20 | grep mon_osd_down_out_interval

The value can be set with:
Code:
ceph config set mon_down_out_interval <INTERVAL_IN_SECONDS>
 
  • Like
Reactions: Azunai333
@VictorSTS , I test it but even after 10m of downtime for Node 3 , re-balancing is not started. The only thing is that I shutdown node 3 gracefully. Does this have relation with re balancing of ceph. The graphical representation shows only Orange color. Once the node 3 comes up it re-builds and shows green color o/p.

Additionally if node 2 too go down along with node 3 what happened to ceph cluster.
 
I test it but even after 10m of downtime for Node 3 , re-balancing is not started. The only thing is that I shutdown node 3 gracefully.
You have 3 nodes. You have the default policy of 3 replica on 3 disks of 3 different hosts. If any node is down, Ceph will not rebalance because it has nowhere to rebalance to, as Ceph will not be able to comply with the default policy as there are only 2 hosts.


The only thing is that I shutdown node 3 gracefully. Does this have relation with re balancing of ceph.
Doesn't really matter.

The graphical representation shows only Orange color. Once the node 3 comes up it re-builds and shows green color o/p.

That means that your PGs (you data) are in warning state. In this case, they are "undersized" because of the 3 replicas only 2 are online at this moment.

Additionally if node 2 too go down along with node 3 what happened to ceph cluster.
If 2 out of 3 nodes are down, there will be no PVE quorum [1] and there will be no Ceph quorum, which means that no I/O is possible for your VMs. You can't run a 3 node Ceph cluster with just one node. If that really happens, you will have to deal with some complex disaster recovery procedures to regain both PVE quorum and Ceph quorum.

I suggest that you create a test Proxmox cluster using VMs, so you can easily test many failure scenarios.


[1] https://pve.proxmox.com/wiki/Cluster_Manager#_quorum
 
@victor
You have 3 nodes. You have the default policy of 3 replica on 3 disks of 3 different hosts. If any node is down, Ceph will not rebalance because it has nowhere to rebalance to, as Ceph will not be able to comply with the default policy as there are only 2 hosts.



Doesn't really matter.



That means that your PGs (you data) are in warning state. In this case, they are "undersized" because of the 3 replicas only 2 are online at this moment.


If 2 out of 3 nodes are down, there will be no PVE quorum [1] and there will be no Ceph quorum, which means that no I/O is possible for your VMs. You can't run a 3 node Ceph cluster with just one node. If that really happens, you will have to deal with some complex disaster recovery procedures to regain both PVE quorum and Ceph quorum.

I suggest that you create a test Proxmox cluster using VMs, so you can easily test many failure scenarios.


[1] https://pve.proxmox.com/wiki/Cluster_Manager#_quorum
@VictorSTS ,

Thanks for clarification. so it is a good Idea to have 5 node cluster to make it more stable state?

As per my understanding Ceph uses host based replication and not the disk based . In case if I have 3 disk per node and if one node goes down then it will not going to re-balance things. Does it depend on the utilisation of the cluster as well ?
 
Thanks for clarification. so it is a good Idea to have 5 node cluster to make it more stable state?

As per my understanding Ceph uses host based replication and not the disk based . In case if I have 3 disk per node and if one node goes down then it will not going to re-balance things. Does it depend on the utilisation of the cluster as well ?
Sorry, I don't really understand what do you mean with "more stable state" and "depend on the utilisation of the cluster". I can't give a proper answer.

If I setup NFS and copy all vm to it, Can I directly lauch vm in Promox from NFS ?
You can run VMs from an NFS share... but why would you like to do that having the option to run Ceph? NFS will become a single point of failure
 
Sorry, I don't really understand what do you mean with "more stable state" and "depend on the utilisation of the cluster". I can't give a proper answer.
Sorry , that I was not able to explain properly. The question is this in 3 node cluster if 2 down simultaneously by any reason then complete cluster goes down.
So to avoid this situation we need 5 node cluster where we can get working condition even if we loose 2 node simultaneously.
Now in a situation :- If i have ceph pool on 3 nodes and each node have 2 disk. Total raw capacity is 6Tb (each disk is of 1tb) . In a scenario of 3,2 replication. If ceph cluster utilization is around 3tb and in that case one node goes down.
How ceph cluster behaves?

You can run VMs from an NFS share... but why would you like to do that having the option to run Ceph? NFS will become a single point of failure
To avoid such unforseen situation in case cluster not available we can directly launch vm from NFS .
 
  • Like
Reactions: VictorSTS

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!