[SOLVED] ceph trouble with non-standard object size

Knuuut

Member
Jun 7, 2018
91
9
8
59
Hi Community,
to create a rbd image of 1T with an object size of 16K is easy. I did it like this:
Code:
rbd create -s 1T --object-size 16K --image-feature layering --image-feature exclusive-lock --image-feature object-map --image-feature fast-diff --image-feature deep-flatten -p Poolname vm-222-disk-4
BUT after creating a snapshot (took very long) of the image I was close to kill my cluster. 2 of 24 OSDs went down and I was able to restart them and the cluster went healthy again.

Removing of an image (without a snapshot) have been a scary moment too, because a lot of IO was going on for about 10 Minutes.

Now I have one big image with a snapshot left and I'm afraid to do anything with it.

Does anybody know what went wrong and any suggestions to what to do next to get rid of this?

Cheers Knuuut
 
16K/object on 1TB are ~67M objects, compared to 262,144 objects with 4MB. This can cause quit a load on operations like snapshots. Export/Import the Image to get 4MB or create a new image, attach it to the VM and migrate.
 
16K/object on 1TB are ~67M objects, compared to 262,144 objects with 4MB. This can cause quit a load on operations like snapshots. Export/Import the Image to get 4MB or create a new image, attach it to the VM and migrate.

That's reasonable.

I don't care about the data inside. I just want to get rid of it without killing my cluster.

Afaik I can't delete it without removing the snapshot first, or am I wrong?
 
I've tried to delete the snaphot and the Cluster was unusable for about half an hour,. Even 2 OSDs went down. Fortunately I was able to interrupt the deletion of the snapshot with ctrl+c. After this the cluster went healthy again after a few minutes.

This is a 4 node cluster with 32 Intel DC 4500 SSDs. I don't think this is a hardware issue.

Does anybody know how to get rid of the 16K objects image without downtime?

Info: This is a 5.1 Proxmox cluster. May an update will help in this situation?
 
Does anybody know how to get rid of the 16K objects image without downtime?
First thing, you have a bottleneck somewhere that causes the osds not to send/receive their heartbeat. Second, either get more resources for your cluster. Or (advanced topic) you may be able to delete it, with the cluster being in that state, when you delete the objects directly (with rados) by looping through a certain amount of them and check how the cluster load is. See the link to it: http://cephnotes.ksperis.com/blog/2014/07/04/remove-big-rbd-image
 
You need to check the rados part, as you delete objects. This is a layer lower then rbd, the image prefix will mark all objects, regardless of what they are in rbd.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!