Ceph cluster full

cking_FMB

New Member
Sep 20, 2023
5
1
3
Muscle Shoals, AL USA
I have a 3 node host cluster and I'm running ceph across them. I screwed up and let the storage on the ceph cluster hit 100%. The storage is still available to the running VMs, but I can't take backups, I can't move the machine's disks to other storage, and if I shut down the running VMs they won't start again. My critical machines are running at the moment but I don't know how long that will last. Running ceph -s from the command line on the hosts just hangs. Running "systemctl status ceph\*.service ceph\*.target" shows that that all of the services except for the monitor daemon on one of the hosts is running. Is there any way to recover from this? I will gladly delete some extraneous disks from VMs in the ceph cluster, but I can't get to the ceph cluster to clear up the space. Any ideas?
 
I read that somewhere but I'm not sure how to make that config change when running ceph in the shell just hangs. My pool size is set to 3 so I would love to change that temporarily to 2, clear up some space, then set it back to 3 for the long term.
 
I did a little research and in my /etc/ceph/ceph.conf the setting osd_pool_default_size is 3. I can set this to 2, then should I just start the Ceph cluster monitor daemon on the host where the service failed? Or will I need to stop all of the daemons with

sudo systemctl stop ceph\*.service ceph\*.target

then start them all back up with

sudo systemctl start ceph-osd.target
sudo systemctl start ceph-mon.target
sudo systemctl start ceph-mds.target

If I do this, is there a chance my running VMs will freak out because the OSDs will be temporarily unavailable? These are production hosts and I really can't afford to kill a bunch of VMs, especially if this doesn't resolve the underlying storage issue and the machines won't start back up.
 
I read that somewhere but I'm not sure how to make that config change when running ceph in the shell just hangs. My pool size is set to 3 so I would love to change that temporarily to 2, clear up some space, then set it back to 3 for the long term.
ceph osd pool set <poolname> size 2

(but you can also do it through the proxmox gui, editing the pool )
 
Even though everything is running for the moment the whole cluster seems to be in a weird state so I'm taking backups of all my VMs using a separate piece of software. I normally use Proxmox backup server for this but those backups just time out right now because the pool is inaccessible, even though the VMs are running on it just fine. It's very weird. Once I have reliable, current backups of all the machines I will try to change the pool size using your recommendation, but I'm not optimistic because other basic ceph commands through the shell just hang up and do nothing. But I will give it a shot this weekend and update here how it goes. It that doesn't work, I may have to rebuild the entire cluster and restore all of my VMs to it after. Hopefully it won't come to that.
 
Also, note that ceph have some security to not reach 100%, it's going read only when you reach around 95%

ceph osd set-full-ratio 0.95

maybe can you try to increase it to 98%, to be able to write again a little bit

ceph osd set-full-ratio 0.98
 
Over the weekend I was able to stop all the ceph related services, then restart them and immediately attempt to set the pool size to 2. This worked and the whole cluster started responding again. I then removed some data and will let it run this week, then reset the size back to 3 next weekend. Thanks so much for the help here. It really made a difference.
 
  • Like
Reactions: spirit

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!