ceph never actually achieves desired redundancy

lewinernst

Member
Jul 31, 2021
18
0
6
28
I have setup a three node ceph cluster with 8 OSDs. My goal is to be able to tolerate one node failure without losing data access. To test this i put a couple of ISOs and vm backups (80 GB total) on the cephfs, waited until it reached steady state and then shut down one node (gracefully). All my nodes are running the latest ceph and pve versions and i have kept the default crushmap. Unfortunately, this consistently resuluts in "reduced data availability":

1684269849836.png

When i restart the Node my pool goes back to healthy. What can i do to make sure, the cluster will actually remain available when losing a node? Can anyone point me to a way to diagnose why the data doesnt get distributed correctly (IMO that should be 3 copies, one on each node)?

The only irregular thing i can think of is that my OSDs are slightly imbalanced due to having added two spare hard drives. However, when i set them to out and wait for steady state one by one , the problem remains. Furthermore, all data together is smaller than the smallest OSD.
1684270114085.png

For reference - my pools:
1684270307632.png

Update: And the state my pool is in with all osds up/in:
1684296889512.png
 
Last edited:
consider what you're doing.

you have three lopsided nodes; any PG that is placed on either of the two larger nodes in excess of the capacity of priddy is not HA by default in a 3/2 pool (and not only, pg placement logic has other variables.) Whats worse, your performance will be utterly shit since you have pg's going on HDD OSDs- so it doesnt matter how fast your ssd OSDs are.

What is the usecase for this? either remove the HDDs from your pool, or add more HDDs and make a HDD specific pool for stuff that is ok with very slow performance. In case its not clear, HAVE THE SAME CAPACITY OF OSDs per class per node.
 
I think your Ceph cluster is working as intended. If you have the default replica = 3, when only two nodes are running, your cluster is in an unhealthy state until the third node comes back. This still allowes you to write data to the cluster, to the best of my knowledge, until only one server is up. Then it will turn to Read Only to safeguard the data.
 
consider what you're doing.

you have three lopsided nodes; any PG that is placed on either of the two larger nodes in excess of the capacity of priddy is not HA by default in a 3/2 pool (and not only, pg placement logic has other variables.) Whats worse, your performance will be utterly shit since you have pg's going on HDD OSDs- so it doesnt matter how fast your ssd OSDs are.

What is the usecase for this? either remove the HDDs from your pool, or add more HDDs and make a HDD specific pool for stuff that is ok with very slow performance. In case its not clear, HAVE THE SAME CAPACITY OF OSDs per class per node.
Please consider two points of clarification from my OP:
a) The total of all data is smaller than the smallest OSD -> shouldn't it therefore still get distributed to all three nodes?
b) For troubleshooting this problem i have already had the hdds set to up/out -> Does this not create the situation you recommend?

I added the hdds for testing exactly that performance drop since i am deciding on whether i should create two pools and manually curate the data to a performance tier or let ceph do that.
 
I think your Ceph cluster is working as intended. If you have the default replica = 3, when only two nodes are running, your cluster is in an unhealthy state until the third node comes back. This still allowes you to write data to the cluster, to the best of my knowledge, until only one server is up. Then it will turn to Read Only to safeguard the data.
Mostly yes but isn't that untrue for the 1 PG listed in the health warning as unavailable?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!