Having trouble to remove OSD from nvme pool

Gilberto Ferreira

Renowned Member
Hello.
I know that could be a shot in the dark, but I will try to clarify as much as I possible.
I have a 4 node PVE Cluster with CEPH (Quincy) installed.
All 4 nodes has 3 SATA 8TB. (I know this is bad idea, But that what do I have for the moment)
Nodes 1, 2 and 4, has 1 nvme each.
We have created a pool with this 3 nvme but decided to remove it.
However when try to stop the osd in order to remove it, I get this message:
unsafe to stop osd(s) at this time (1 PGs are or would become offline)

Any clue?

Thanks a lot.
 
Last edited:
This means that if you removed the disks, then some data would become unavailable in the meanwhile. The next steps depend on if you want to keep the data on the OSDs. Can it be safely removed or do you want to migrate the data?
 
Then it is probably easiest to destroy the pool via the Web UI, then you should be able to remove the OSDs.
 
Additionally, can you post the output of ceph osd df tree please so we know exactly what we are talking about?

Ideally, if you plan to remove an OSD, you first set it to OUT to let Ceph know that it should not be used anymore. It will move the data somewhere else in the cluster (if possible). Once the OSD is empty, you should be fine to stop and destroy it.

But yeah, if you don't need the pool anymore at all, delete it first.
 
Hello again.

Attached the command output: ceph osd df tree.
Another situation that I am facing is that when stop the OSD laying in the NVME disk, the storage as a whole just stop to works properly.
I already try out and in on the OSD and then stop it.
When mark it as out and the stop it, it work the stop command, but then, the R/W downs to 0 (ZERO!) and loose the cephsfs access.
In fact a lot of video storaged in this cephfs pool no longer plays as should be.
There is some cephfs using the .mgr pool and others.
The amazing thing about this is that nvme is no longer in using in any pool.
The whole pool which was using nvme disk already is gone.
 

Attachments

  • 60046b74-1a51-499b-81e3-507c95fa9763.jpeg
    60046b74-1a51-499b-81e3-507c95fa9763.jpeg
    109.2 KB · Views: 16
There still seems to be some data stored on the OSDs on the SSD as it looks like from your screenshot. This is probably also the reason why some data stored in the cluster is not available when stopping them. Since you are using the default replicated rule, Ceph currently also stores some data in those OSDs. You need to set those OSDs to out and then wait til they got rebalanced to the other OSDs completely.

Only after those OSDs have been emptied, you can stop and delete them.

Some other things:
  • your pool seems to be hitting its limits very soon, you should think about adding additional disks since most of them are alread ~80% full
  • you have 555 placement groups in the cephfs pool, ceph recommends an amount of PGs that is a power of two
 
There still seems to be some data stored on the OSDs on the SSD as it looks like from your screenshot. This is probably also the reason why some data stored in the cluster is not available when stopping them. Since you are using the default replicated rule, Ceph currently also stores some data in those OSDs. You need to set those OSDs to out and then wait til they got rebalanced to the other OSDs completely.

Only after those OSDs have been emptied, you can stop and delete them.

Some other things:
  • your pool seems to be hitting its limits very soon, you should think about adding additional disks since most of them are alread ~80% full
  • you have 555 placement groups in the cephfs pool, ceph recommends an amount of PGs that is a power of two
so, i left the osds "out" from saturday until early yesterday, and it didn't empty.

about the pgs, they are gradually rising, it was set to 1024 a while ago and it has been rising since then and it goes up until it reaches 1024.
I already gave them out again and I'll wait until they empty, this can take a long time, right?
 
so, i left the osds "out" from saturday until early yesterday, and it didn't empty.

about the pgs, they are gradually rising, it was set to 1024 a while ago and it has been rising since then and it goes up until it reaches 1024.
I already gave them out again and I'll wait until they empty, this can take a long time, right?
Yes, this can take awhile - especially if you are increasing the number of PGs at the same time.
 
So what if I just create another replicated rule??
I don't think it would make sense for now, since the data is already on the OSDs. How is the rebalancing looking?

Code:
ceph osd df tree
pveceph pool ls

Can you give some additional information about your cluster?

Code:
ceph pg ls-by-osd 12
ceph pg ls-by-osd 13
ceph pg ls-by-osd 14
ceph -s
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!