Erasure code and min/max in Proxmox

herzkerl · Mar 29, 2024

I set up an erasure coded pool via Ceph Dashboard, and changed the rule later by manually editing the CRUSH map:

Code:

rule erasurecode {
    id 6
    type erasure
    step set_chooseleaf_tries 5
    step set_choose_tries 100
    step take default class hdd-cached
    step choose indep 3 type host
    step choose indep 2 type osd
    step emit
}

The goal was to have the EC pool still being available if one host is down for maintenance.
When I stopped the OSDs on one host (simulating maintenance), the some PGs got unavailable.

In Proxmox, the pool was shown with "size 6, min. size 5" — I changed that to "size 6, min. size 4", and now it seems to work.
I could stop the OSDs on one host and still access the data (and put new data as well).

Can someone explain how size/min in Proxmox UI works with erasure coded pools? (I know they're not fully supported yet.)

aaron · Mar 29, 2024

With what k and m parameters did you create the pool?

Have you looked at our documentation and Cephs?

herzkerl · Mar 29, 2024

aaron said:
With what k and m parameters did you create the pool?

4+2, and then changed the rule to spread the data to two OSD per each of the three nodes according to a few blog posts like this one.

aaron said:
Have you looked at our documentation and Cephs?

Probably not enough. I think I haven't looked at your documentation regarding EC pools, but I've read quite a bit though Ceph's.

Thank you for pointing me to your docs! They clearly say:

When planning an erasure coded pool, keep an eye on the min_size as it defines how many OSDs need to be available. Otherwise, IO will be blocked.

Ceph's documentation says:

In Octopus and later releases, erasure-coded pools can recover as long as there are at least K shards available. (With fewer than K shards, you have actually lost data!)

but also:

We recommend that min_size be K+1 or greater to prevent loss of writes and loss of data.

So in my case, min_size should be set to 5 (which seems to have been the case after creating the pool) — but then, during maintenance, no reads/writes are allowed. Having set min_size to 4, reads/writes still worked.

I think I now understand the implications, though: Ceph can't know if those other two OSD (in the down host) are only down for maintenance or lost forever, so by default it wants me not to write any more data to the pool, because if another OSD failed, there could be data loss. Is that correct?

Is there any way to instruct the pool to be set to read-only in that case? So, leave min_size at 5 as recommended, but still being able to read data if one of the hosts is down (and therefore, two OSD)?

Search

Search

Erasure code and min/max in Proxmox

herzkerl

Member

aaron

Proxmox Staff Member

herzkerl

Member