Not able to use pveceph purge to completely remove ceph

daniel1e6 · Oct 31, 2019

I'm new to Proxmox... and Linux. Trying to completely remove ceph and I cannot. Tried pveceph purge after stopping all services and got this message below. I originally installed, uninstalled, then reinstalled Ceph because I wasn't able to add a second NVMe drive from each of the three servers which are currently on the same cluster. I had to install it the second time from the terminal. After a clean install, I discovered that the old monitors were not deleted and didn't work. I couldn't seem to delete them. When I tried to purge, it wouldn't let me (message above).

I also tried apt remove, apt autoremove, some level of upgrade. Nothing seemed to work and I couldn't seem to remove ceph-mon. Any ideas on how to completely remove this would be greatly appreciated. My next step is a fresh install of Proxmox on all 3 servers, which I'm trying to avoid.

t.lamprecht · Oct 31, 2019

'pveceph purge' purges the packages, but not the monitors.

So, a bit of a raw and rough approach to get you out of this could be

Code:

## stop all remaining ceph-services
# systemctl stop ceph-mon.target
# systemctl stop ceph-mgr.target
# systemctl stop ceph-mds.target
# systemctl stop ceph-osd.target

# avoid that they're being restarted by systemd the next boot (the low level way)
# rm -rf /etc/systemd/system/ceph*

## be really sure they're stopped:
killall -9 ceph-mon ceph-mgr ceph-mds

## then do
# rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/ 

## then retry purge
# pveceph purge

daniel1e6 · Oct 31, 2019

Good morning. This seemed promising and is good to know for future issues. However, it didn't work in this situation. Here's what I got. Thanks!

t.lamprecht · Oct 31, 2019

Hmm, strange, I'm on the go so not able to investigate this further today.

Can you try native apt purge:

Code:

 apt purge ceph-mon ceph-osd ceph-mgr ceph-mds

?

daniel1e6 · Oct 31, 2019

I appreciate the quick response. That didn't work. Here are some of the errors. Thanks.

daniel1e6 · Nov 3, 2019

Hi Tom, Any updates regarding this bug? These issues are definitely preventing us from meeting our production release date. More importantly, it seems like this issue would result in a catastrophic failure if we were in production - which is concerning. I look forward to your help in figuring out a solution to this bug.

proxtest · Nov 3, 2019

@daniel
Sorry but if you new in Linux and Proxmox and Ceph - you should never go in production! Learning never ends for us too but for you its way to early to run production on Linux, any Linux!
apt and dpkg are Debian Tools, clear this situation first.

daniel1e6 · Nov 3, 2019

I appreciate the feedback. Thanks. Tom provided a solution (he seems to have lots of experience). The solution didn't work. I'm not sure this bug can be solved. It may require a new installation.

t.lamprecht · Nov 3, 2019

So, dpkg complains about a sys-v init script error, this is a fallback from systemd and normally not present (but I did not checked closely). I'd try to remove that and retry the apt remove/purge command:

Code:

rm /etc/init.d/ceph

daniel1e6 said:
it seems like this issue would result in a catastrophic failure if we were in production

how so? Once you setup a Ceph service you normally want to keep it, purging it from server is not a common thing to do, especially not in production.

Also, you can go to our enterprise support team to get more involved responses https://www.proxmox.com/en/proxmox-ve/pricing

I mean we do not know how you get into this situation at all, I personally purged and re-installed Ceph quite often on testing, a normal setup done over Proxmox VE tooling made above work most of the without big issues (I once deleted a bit to much, but that was my fault then). So I really would not call this a bug.

On a more general note, if you really do not know in what state this all is, and you messed around with something which you were not fully aware of it's effects, I'm not sure it's a good idea to go forward like this. It could be good to re-install the cluster, to be sure you're in a clean state. Then, before creating any relevant VM or Container set all up, test, and be sure to log your information somewhere.
Reading up on the Proxmox VE docs, and some general Linux information could be good - as said I do not know any background of this service; but once you said that you may miss "production deadlines" I got the feeling that this may be more serious than initial thought, i.e., not just a test+learn+evaluate setup but something much more serious with possible big implications on mis-setup/failures.
So, to be honest, I'd recommend enterprise support here - there are a lot of helpful people here on the community forum, but for such things and especially if you say that you do not have much experience with Linux in general or Proxmox VE, it seems more appropriate; just my two cents.

daniel1e6 · Nov 3, 2019

Hi Tom,

I'm sure there are times a reinstall is required during production. If I was in production and decided Ceph isn't working but choose to keep Proxmox, I would need a fresh install.

I followed the documentation to remove the monitors via the GUI. Then I used the documentation to attempt to remove CEPH. Like many others, I'm in the process of reformatting the drives so I can start over. Our goal has been to setup, test, and learn with the community level of support then transition to a level of support more suitable for production. In parallel, we are interviewing people to manage proxmox for our company as we scale up. The issue happened to me both times I attempted to remove and reinstall CEPH using the proxmox documentation, followed by the recommendation you provided. I'll consider upgrading to the next level of support before production release.

I may be a newbie but I followed the documentation. Others have had similar issues but I have not seen a solution. Here's what I've done. 1) Installed proxmox 2) Setup the network for private and public NICs 3) pinged private and public 4) double checked the network interface settings on each node 4) Rebooted and rechecked ping 5) Added my subscription keys (FYI - pve-enterprise.list had to be set manually). I updated for each node through the GUI and rebooted and checked again to be sure there weren't issues with the kernel 6) Created the cluster and joined the nodes with the bridge and private network set properly (once, I did this after I installed CEPH on each node. It created 3 mgrs and resulted in issues connecting the default monitors. I reinstalled PVE that time) 7) Installed a total of 3 monitors. 8) Installed a few OSDs. I have 6 NVMe 1TB drives. Proxmox is installed on each node with a dedicated drive. I was never able to add 6 OSDs (the reason for my attempt to reinstall CEPH. It seems like maybe I need to create a volume for lvm or create a CephFS. I'll have to research OSDs once the first issues are resolved. During my preliminary search, I haven't found much documentation clearly explaining how to prepare a new drive for Ceph OSDs. Then I'll create my container and VM pools.

damon1 · Nov 4, 2019

when i clicked on the link it took me to your last post so the below may not be useful at all, given the above.

Hi Daniel,
I have a similar problem in that I cannot get Ceph back to square one, which led me to your post.
I am certainly no expert.

Ceph is its own file system so you dont add it like a normal drive.
Once the OSD's are added you create a cephFS and it appears like magic.

i found that sometimes drives need to be cleaned of previous partitions for them to appear in the OSD window.

cfdisk /dev/sdx or fdisk /dev/sdx usually do the job.
Remember to double check it is the right disk.
Then write the changes and the disk should appear.

use the information from this window to find your disk names. The top two are also nVME drives

hope this helps

damon

daniel1e6 · Nov 4, 2019

Awesome, thanks for the information. I'll definitely try that when I start adding the OSDs. Thanks!

Alwin · Nov 4, 2019

daniel1e6 said:
During my preliminary search, I haven't found much documentation clearly explaining how to prepare a new drive for Ceph OSDs.

Did you find our documentation in our search? Its also local to every Proxmox VE node. If so, was there something missing from it?
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

daniel1e6 · Nov 4, 2019

Thanks for reaching out. Yes, I did see that documentation. The documentation didn't seem to explain how a drive can become recognized so an OSD can be created. I was able to see both drives under "Disk" but was not able to add the OSD for that disk. The drives were brand new when I first encountered the issue. Also, when I removed all partitions from the drive, I received a Grub Error. One of the three servers didn't result in an error after deleting the partitions. The server that didn't result in a Grub error was probably the result of GPT initialization before deleting the partitions. I haven't invested much time researching this area since I reinstalled once I received the grub errors and haven't been able to get past the current issues.

After sending proxmox my Syslogs, I received an email explaining that my kernel is very outdated. Since this is a new install, this is concerning. Could some of my issues be the result of the install not using the latest kernel. I added my subscription keys, performed all updates, and rebooted between updates. I even updated the pve-enterprise.list to be sure I received enterprise repository updates/upgrades moving forward. I suspect most paying subscribers don't receive automatic enterprise updates since this seems to require a manual change. Is this something proxmox will fix and is there a temporary workaround so I can update the kernel? Thanks.

daniel1e6 · Nov 4, 2019

Maybe the outdated kernel issue is the root cause of some of my issues. This is a new Proxmox VE 6 install on a new server with new drives. Here's part of the email I received from proxmox. A solution wasn't provided in the email. If this is a bug, is it something proxmox plans on fixing soon? Seems like a major issue if it is a bug.

"The only issue I see in your logs, that you run a quite outdated kernel => update to latest version."

Alwin · Nov 5, 2019

daniel1e6 said:
After sending proxmox my Syslogs, I received an email explaining that my kernel is very outdated.

Can you PM me the ticket ID? I don't seem to see it on our ticket system.

daniel1e6 said:
The documentation didn't seem to explain how a drive can become recognized so an OSD can be created. I was able to see both drives under "Disk" but was not able to add the OSD for that disk.

In this section of the docs, you can see on the picture the OSD tab, where you can create new OSDs. This may need some further description.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_osds

daniel1e6 said:
"The only issue I see in your logs, that you run a quite outdated kernel => update to latest version."

Can you please post a pveversion -v?

102020 · Dec 11, 2019

So here we go, my buddy and I worked through the issue, I do believe this is a ceph bug. We both work as high level ITs. In short, do all the stuff listed above, once done run these 3 commands and you should have a working package again. It appears after doing a purge or removing ceph ceph-mon ceph-osd one of the shared libraries phyiscally goes bye bye however the environment still thinks the library is present.

Run initial repair on all ceph packages:

Code:

for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done

Reconfigure the deb packages:

Code:

dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse

Rerun the same repair script:

Code:

for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done

Run the installer:

Code:

pveceph install

Should do the job!

Alwin · Dec 11, 2019

102020 said:
It appears after doing a purge or removing ceph ceph-mon ceph-osd one of the shared libraries phyiscally goes bye bye however the environment still thinks the library is present.

Do you recall, which one it was?

102020 · Dec 11, 2019

Code:

root@pve1:/usr/lib/x86_64-linux-gnu/perl5/5.28/auto/PVE/RADOS# pveceph init



Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.28/auto/PVE/RADOS/RADOS.so' for module PVE::RADOS: libceph-common.so.0: cannot open shared object file: No such file or directory at /usr/lib/x86_64-linux-gnu/perl/5.28/DynaLoader.pm line 187, <DATA> line 755.
at /usr/share/perl5/PVE/Storage/RBDPlugin.pm line 13.
Compilation failed in require at /usr/share/perl5/PVE/Storage/RBDPlugin.pm line 13, <DATA> line 755.
BEGIN failed--compilation aborted at /usr/share/perl5/PVE/Storage/RBDPlugin.pm line 13, <DATA> line 755.
Compilation failed in require at /usr/share/perl5/PVE/Storage.pm line 32, <DATA> line 755.
BEGIN failed--compilation aborted at /usr/share/perl5/PVE/Storage.pm line 32, <DATA> line 755.
Compilation failed in require at /usr/share/perl5/PVE/CLI/pveceph.pm line 17, <DATA> line 755.
BEGIN failed--compilation aborted at /usr/share/perl5/PVE/CLI/pveceph.pm line 17, <DATA> line 755.
Compilation failed in require at /usr/bin/pveceph line 6, <DATA> line 755.
BEGIN failed--compilation aborted at /usr/bin/pveceph line 6, <DATA> line 755.
root@pve1:/usr/lib/x86_64-linux-gnu/perl5/5.28/auto/PVE/RADOS# ldconfig -v | grep libceph
ldconfig: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
ldconfig: Path `/usr/lib/x86_64-linux-gnu' given more than once
ldconfig: Path `/lib/x86_64-linux-gnu' given more than once
ldconfig: Path `/usr/lib/x86_64-linux-gnu' given more than once
ldconfig: Path `/usr/lib' given more than once
ldconfig: /lib/x86_64-linux-gnu/ld-2.28.so is the dynamic linker, ignoring

        libcephfs.so.2 -> libcephfs.so.2.0.0

I believe it was: libceph-common.so.0
Which is part of: librados2_14.2.4.1-pve1_amd64.deb

Alwin · Dec 11, 2019

102020 said:
I believe it was: libceph-common.so.0

Strange, as the librados2 package is a dependency of the other ceph packages.

Not able to use pveceph purge to completely remove ceph

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

Active Member

New Member

Proxmox Staff Member

New Member

Active Member

New Member

Proxmox Retired Staff

New Member

New Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff