Erasure code and min/max in Proxmox

Mar 18, 2021
92
8
13
I set up an erasure coded pool via Ceph Dashboard, and changed the rule later by manually editing the CRUSH map:

Code:
rule erasurecode {
    id 6
    type erasure
    step set_chooseleaf_tries 5
    step set_choose_tries 100
    step take default class hdd-cached
    step choose indep 3 type host
    step choose indep 2 type osd
    step emit
}

The goal was to have the EC pool still being available if one host is down for maintenance.
When I stopped the OSDs on one host (simulating maintenance), the some PGs got unavailable.

In Proxmox, the pool was shown with "size 6, min. size 5" — I changed that to "size 6, min. size 4", and now it seems to work.
I could stop the OSDs on one host and still access the data (and put new data as well).

Can someone explain how size/min in Proxmox UI works with erasure coded pools? (I know they're not fully supported yet.)
 
With what k and m parameters did you create the pool?

Have you looked at our documentation and Cephs?
 
With what k and m parameters did you create the pool?
4+2, and then changed the rule to spread the data to two OSD per each of the three nodes according to a few blog posts like this one.

Have you looked at our documentation and Cephs?
Probably not enough. I think I haven't looked at your documentation regarding EC pools, but I've read quite a bit though Ceph's.

Thank you for pointing me to your docs! They clearly say:
When planning an erasure coded pool, keep an eye on the min_size as it defines how many OSDs need to be available. Otherwise, IO will be blocked.
Ceph's documentation says:
In Octopus and later releases, erasure-coded pools can recover as long as there are at least K shards available. (With fewer than K shards, you have actually lost data!)
but also:
We recommend that min_size be K+1 or greater to prevent loss of writes and loss of data.

So in my case, min_size should be set to 5 (which seems to have been the case after creating the pool) — but then, during maintenance, no reads/writes are allowed. Having set min_size to 4, reads/writes still worked.

I think I now understand the implications, though: Ceph can't know if those other two OSD (in the down host) are only down for maintenance or lost forever, so by default it wants me not to write any more data to the pool, because if another OSD failed, there could be data loss. Is that correct?

Is there any way to instruct the pool to be set to read-only in that case? So, leave min_size at 5 as recommended, but still being able to read data if one of the hosts is down (and therefore, two OSD)?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!