Ceph PG error

Jackmynet · Oct 5, 2024

Hi all,

I am a new beginner proxmox user and have just setup our first small data centre for our business running 3 VMs on a 3 node cluster.

I have setup Ceph shared storage and enabled HA for my VMs which seems to be working fine. However I have one error which I cannot figure out.

I have 4 1Tb HDD on each server along with 2 480Gb SSD. My ceph osds have been configured as below on each node.

OSD 1: 500Gb partition of HDD 1 and 80Gb of SSD 1 as the DB disk and 80Gb of SSD 2 as the WAL disk.

OSD 2: 500Gb partition of HDD 2 and 80Gb of SSD 1 as the DB disk and 80Gb of SSD 2 as the WAL disk.

OSD 3: 500Gb partition of HDD 3 and 80Gb of SSD 1 as the DB disk and 80Gb of SSD 2 as the WAL disk.

OSD 4: 500Gb partition of HDD 4 and 80Gb of SSD 1 as the DB disk and 80Gb of SSD 2 as the WAL disk.

The problem is ceph shows health warning but is full green circle with one grey part. It says 1 PG status is unknowns.

How do I troubleshoot such an error? I think this is the only issue I have across my configuration currently. The ceph storage pool is also working well with all my vms now moved to it.

Any help would be much appreciated.

gurubert · Oct 5, 2024

Please.post the output of "ceph -s", "ceph osd df tree" and "ceph pg dump".

Jackmynet · Oct 5, 2024

Thanks for the reply!

See attached snapshots of the above outputs.

They were too long to paste the text in the comment. The pg dump is too long to even get a snapshot but I have snapshot what I think is the relevant part?

gurubert · Oct 5, 2024

You use inages of multiple KBs in size to post text of a few bytes. why? Please attach the text files here somehow.

Jackmynet · Oct 5, 2024

text files attached...

gurubert · Oct 7, 2024

What happend to the pool with the id 1 (usually the pool called ".mgr")?

Its only placement group is in state "unknown" which is not good.

Please try to restart OSDs 2, 3 and 7 with "systemctl restart ceph-osd@2.service" etc on their nodes.

If that does not fix the issue please post the output of "ceph health detail" and "ceph osd pool ls detail".

Jackmynet · Oct 7, 2024

The pool mgr with this PG is actually still there its just not in use?

I see it shows actually that this PG is assigned to that pool still which i did not notice before.

We did mess around on original setup a bit with CEPH having to wipe and delete and re do the OSD's when we realised it was best to have the db & wal on ssd.

I have restarted each osd and it did not help, ceph still recovers to the same state.

See attached the output requested.

gurubert · Oct 7, 2024

You seem to have removed the OSDs too fast and the pool was not able to migrate its PG to other OSDs. And now it cannot find the PG any more.

You can try to remove the pool and recreate it. It is called ".mgr" with a dot in front.

Jackmynet · Oct 7, 2024

Ok I have VM's using this storage now and working perfectly can I do this without disturbing them and having to move them?

Thanks for the help. this diagnosis sounds correct as it was all done quite quickly on setup

gurubert · Oct 7, 2024

Yes, this should not affect the other pool.

Jackmynet · Oct 7, 2024

Is it the .mgt pool I need to remove and recreate or the main pool i am using as storage?

gurubert · Oct 7, 2024

The .mgr pool is the one with the missing PG, you need to remove it and recreate it with the same name.

The MGR stores information in that pool.

Jackmynet · Oct 7, 2024

Ok I removed it now and it automatically recreated itself?

No more health warning on Ceph! It is recovering now. Thank you so much for your help and advise.

gurubert · Oct 7, 2024

The MGR needs that pool and will create it, I forgot.

Jackmynet · Oct 7, 2024

Thanks again. Is the way I have setup my OSD's acceptable? With the HDD as main disk and SSD as DB and WAL disk

There is a lot of conflicting information on OSD creation and ssd/hdd differences.

gurubert · Oct 7, 2024

Jackmynet said:
Is the way I have setup my OSD's acceptable?

IMHO 80GB for WAL is way too much. It should be sufficient to have one extra 80GB of SSD space for HDD OSDs that then will contain DB and WAL. OSDs will automatically put WAL and DB together if you only specify one db-device on creation.

Jackmynet · Oct 7, 2024

Ok thanks again for your help.

Ceph PG error

Member

Distinguished Member

Member

Attachments

Distinguished Member

Member

Attachments

Distinguished Member

Member

Attachments

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

We value your privacy