Ceph and mismatched disks

arubenstein · Nov 21, 2023

I've been doing some research on CEPH using mismatch underlying hard drives. I've read (and I think it is fairly obvious) that you need to have symmetry at least node to node. So...

Let's assume that I have a cluster of three nodes, each node today has three 15 TB SSDs. Time goes on, we need some more space/performance, and I happen to have a bunch of 2 TB SSDs laying around. So, I stick three of those 2TB drives into each node, now each node has (3) 15 TBs, and (3) 2 TBs.

Is this just really a bad idea, or just keep on trucking?

jsterr · Nov 21, 2023

Hi @arubenstein in that case I would create a second device class and another crushrule that uses this tag for example ssd-2. you would have 2 pools then, if thats ok for you.

ceph osd crush rule create-replicated replicated-ssd1 default host ssd1
ceph osd crush rule create-replicated replicated-ssd2 default host ssd

Then you need to assing the specific rules to each pool by editing the pool via ui.

arubenstein · Nov 21, 2023

jsterr said:
Hi @arubenstein in that case I would create a second device class and another crushrule that uses this tag for example ssd-2. you would have 2 pools then, if thats ok for you.

Actually, my intent (and need) is to expand the existing pool.

jsterr · Nov 21, 2023

arubenstein said:
Actually, my intent (and need) is to expand the existing pool.

Ceph would accept those 2 TB disks by setting a weight according to the size of the disk. But you need to be careful because you need to always make sure that if one device fails of a host (for example a 15TB SSD) that the rest of the disks available on that host need to be able to recover those data that was on that failed disk (for example 75% of 15TB). Why? Because if one osd gets 95% usage your ceph cluster will go offline. because of mon osd fullratio

After creating the osds, ceph will automatically use them. Make sure that they have similiar performance because the slowest disk in a pool will determine the ceph write performance. The general recommendation is to not mix sizes because it increases complexity.

sb-jw · Nov 21, 2023

In principle this is not a problem, but it is still not optimal.

But you definitely have to keep an eye on the fill levels of the individual OSDs. You may have to balance the OSDs individually so that there are no problems. When an OSD reaches full ratio, your CEPH is set to read-only, keep that in mind. Otherwise, please note that if your CEPH is already well filled, when adding additional OSDs it may happen that the already full OSDs temporarily receive more data.

You should therefore always reschedule early

arubenstein · Nov 21, 2023

I concur on all the pointes above. I'd always make sure there is enough spare capacity per node for the largest disk to fail. I think that is a matter of normal cluster health/maintenance.

alexskysilk · Nov 21, 2023

sb-jw said:
When an OSD reaches full ratio, your CEPH is set to read-only,

A single OSD will not push your pool to read only state. As long as there is sufficient PG space on the minimum pool size number of nodes, the pool can ostensibly continue to function. The wider point stands, however- dont let your OSDs get full.

sb-jw · Nov 21, 2023

@alexskysilk This is not correct, all pools that are related to the OSD are set to read-only. See also the following KB article: https://www.suse.com/de-de/support/kb/doc/?id=000019724

Search

Search

Ceph and mismatched disks

arubenstein

New Member

jsterr

Renowned Member

arubenstein

New Member

jsterr

Renowned Member

sb-jw

Famous Member

arubenstein

New Member

alexskysilk

Distinguished Member

sb-jw

Famous Member

We value your privacy