Ceph reinstallation issues


Well-Known Member
Feb 13, 2019
I am having problems configuring ceph within pve. The blocker for me is that, as I have obviously made a mistake with the configuration somewhere, I am not able to purge and restart.

In other words, I have a forked ceph installation on my newly installed 6.0 node. These are the things I have tried in order to reset the configuration and restart the ceph installation/configuration.

* pveceph purge - "unable to get monitor info from DNS SRV with service name: ceph-mon"
* rm -Rf /etc/ceph /etc/pve/ceph.conf /etc/pve/priv/ceph* /var/lib/ceph
* apt remove ceph ceph-base ceph-mon ceph-mgr ceph-osd ceph-mgr-dashboard ceph-mgr-diskprediction-local ceph-mgr-ssh (extra packages were installed during one of the partially successful attempts at ceph installation) - apt fails because the ceph*.prerm scripts in /var/lib/dpkg/info fails to stop the services
* rm ceph-{base,mds,mgr,mon,osd}.prerm in the dpkg folder
* retry of above apt remove - successful
* rm ceph-{base,mds,mgr,mon,osd}.* in the dpkg folder
* rm -Rf /etc/ceph /etc/pve/ceph.conf /etc/pve/priv/ceph* /var/lib/ceph

I've been using posts from https://forum.proxmox.com/threads/ceph-config-broken.54122/page-2 as inspiration.

After the above steps I try installing ceph cleanly, getting these results:

* pveceph install
122MB additional disk space etc etc.
- installed ceph nautilus successfully
configure ceph in GUI
public network set to default network of node
cluster network set to default network of node (I have a separate network intended for cluster)
monitor node = pve node
- error with cfs lock 'file-ceph_conf': command 'cp /etc/pve/priv/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring' failed: exit code 1 (500)

Suggestions? Or is reinstallation of the node the only solution when the ceph installation gets borked?
Redoing the above, but leaving the /etc/ceph folder in place (so the keyring can be copied there), I again reinstall the ceph packages and attempt the configuration in the GUI:
- Could not connect to ceph cluster despite configured monitors (500)

Seems the mon is not created/started.
Unfortunately not - I don't know how many times I've reinstalled pve due to wanting to tweak my ceph configuration.
Unfortunately not - I don't know how many times I've reinstalled pve due to wanting to tweak my ceph configuration.
Thanks for your reply, I found out that using a debian version lower (9 instead of 10) I could purge ceph and after that reconfigure it using proxmox. I tried it several times in a row and it kept working. There might be something wrong with the combination of debian buster + proxmox + ceph nautilus.
I remember having the same issue with 5.4 (on Debian 9) back a few months ago when I was experimenting with pve/ceph for work. Also, Nautilus is not supported on Debian 9, which is why PVE had to wait for Buster/10 to upgrade ceph.
I've had the same issues upgrading from PVE 5 (using only local storage, no Ceph was involved at all yet) to 6 with the intention to finally move everything to Ceph. The problematic step was pveceph createmon throwing "Could not connect to ceph cluster despite configured monitors".

I solved it (rather: messed my way around) by commenting line 202 in /usr/share/perl5/PVE/API2/Ceph/MON.pm. After that it sill complained that "monitor <hostname> already exists" so I also commented line 74 in that same file. You may have to run rm -rf /var/lib/cep/mon/* before executing pveceph createmon again. In my case it completed fine and the monitor came up. The same needs to be done for every node in the cluster.

Hint: apt install libdevel-trace-perl then executing pveceph like perl -T -d:Trace /usr/bin/pveceph mon create turned out to be very helpful to debug this.
I'm struggling with I think the same issue now. I have torn down my Ceph storage configuration, with a view to then rebuilding it so I get to know the process. Everything looked to be removed ok. I then ran the ceph setup again, and I was able to configure two out of the three nodes. But the old master node, will not allow me to add a monitor to it. I get the error below:

error during cfs-locked 'file-ceph_conf' operation: command 'chown ceph:ceph /var/lib/ceph/mon/ceph-nuc10i7-pve01' failed: exit code 1

Any ideas or pointers? I'm want to try and fix this, rather than default to a re-install.
ok - just to follow up on this. I have managed to bring ceph back to a fully working state without a re-install. As simple as it sounds, I just needed to re-create the two folders below on the node with the issue. Adding Manager and Monitors via the CLI or UI then created the sub-folders (ceph-mon.nuc10i7-pve01 and ceph-mgr.nuc10i7-pve01) for me. For some reason, I just needed to manually create the parent folders.

mkdir /var/lib/ceph/mgr
mkdir /var/lib/ceph/mon

That was it.
I take that back. Whilst it looks like everything is ok, it's still now. On the 'old' primary node I have a osd which is orphaned, and I can't find a way to remove it. On the other nodes, within /var/lib/ceph/osd/ I see each of the nodes listed. Whereas on the 'old' primary node, it only shows itself.

I'm getting further. On the 'old' primary node. I ran ceph osd tree. This showed me the orphaned OSD (ID was 0). From there I ran pveceph osd destroy 0 to remove it.

Everything looks ok. But I cannot understand why cluster node 2 and 3 show all the osds in /var/lib/ceph/osd/. Whereas on node 1 (the old master node), that same folder only has one osd in it.

Would anyone be able to provide some insights?
Last edited:


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!