Issues creating CEPH EC pools using pveceph command

Superfish1000

Active Member
Oct 28, 2019
29
5
43
31
I wanted to start a short thread here because I believe I may have found either a bug or a mistake in the Proxmox documentation for the pveceph command, or maybe I'm misunderstanding, and wanted to put it out there. Either way I think it may help others.

I was going through the CEPH setup for a new server that I wanted to configure with an Erasure Coded pool with a k=2, m=1 config that had a crush-failure-domain set to the OSD level. I haven't done this in a bit, and forgot most of how I did it last time. I did remember there being a few commands that were needed in order to establish a EC pool and a replicated metadata pool, and that they needed to be added as storage using another command that specified this.

I initially located an online thread about using the erasure-code-profile option in the native CEPH tools in order to establish a profile, then apply it using the native CEPH create commands. Having only a vague memory I opted to try this and it all seemed to be going smoothly. Unfortunately I ran into some other issues there with the pool saying it didn't support RBD and I ultimately decided to skim the docs again and blow up the pool.

I had previously seen the command,
Code:
pveceph pool create <pool-name> --erasure-coding profile=<profile-name>
but was unable to get it to accept the profile I had created and instead just went with the working CEPH native commands(this was the pool I just had to blow up). Below are the commands I attempted to run before and again just now, and the error that I got.

Code:
ceph osd erasure-code-profile set raid5-profile k=2 m=1 crush-failure-domain=osd
pveceph pool create MainCEPH_EC5-data --erasure-coding profile=raid5-profile

1735921364131.png

I ultimately just reformatted the command using pveceph instead of trying to use a profile and ended up with the following.
Code:
pveceph pool create MainCEPH_EC5 --pg_num 32 --erasure-coding k=2,m=1,failure-domain=osd

If anyone can explain what I did wrong or let me know if this was actually a bug then I'd appreciate it.

pve-manager/8.3.0/c1689ccb1065a83b (running kernel: 6.8.12-4-pve)
ceph version 19.2.0 (3815e3391b18c593539df6fa952c9f45c37ee4d0) squid (stable)
 
Last edited:
If anyone can explain what I did wrong
No idea, sorry.

But: let pveceph create that profile! Just tell it what you want. One of my (technically) successful attempts included:
Code:
~# pveceph pool create ec22 --erasure-coding k=2,m=2                       
created new erasure code profile 'pve_ec_ec22'
pool ec22-data: ...
Without any pre-preparation.
 
Why do you want to create a Ceph pool (or even use Ceph at all) in a single Proxmox server?
I had a few reasons in my case. Primarily I wanted to have the ability to expand my overall storage pool more easily than I could with the other methods I was thinking of by swapping drives with larger drives. Right now the system I am working with has 4 x 1TB drives, but I would like to have the option to upgrade that to 4 - 8TB drives in the future.

In the past I have used a NAS software that used BTRFS, and I greatly appreciated the flexibility to add or remove from the storage pool or even to change the RAID level on the fly. Unfortunately I have run into no end of issues with BTRFS in general and don't want to repeat them. I have used CEPH a bit and it seems like it will be more workable in this respect while having similar behavior, and this is a low stakes situation so I figured I'd try to hack it together.

My second reason is that I have a 4 node cluster with CEPH installed where it is supposed to be communicating over a 40GB IPoIB connection, and the performance is, frankly, abhorrent. It's definitely not an optimal use case, but I'm having an extremely hard time identifying a legitimate reason why it's so incredibly bad. I have wasted many days trying to troubleshoot and test it, and I also wanted to use this as a bit of a test case to see if I can learn more about CEPH and try to figure out why it's so bad on that pool.
Even in this severely handicapped use case I'm able to achieve a stable 150MB/s write speed to the pool with four, old, cheap, used 1TB HDDs while my pool with significantly better HDDs is consistently achieving less than 50MB/s write speed.

While I think it goes without saying, I would also note that I fully understand that none of what I am doing is advisable, best practice or even very sane, but I like experimenting and the setup that I ended up running into this behavior on interested me.

My highly suspect mad scientist experiment aside, I primarily wanted to know if I had made a mistake on the command itself or if this was indeed a bug in the pveceph command since I think this could help others either way.