Reinstall CEPH on Proxmox 6

May 5, 2018
7
1
3
46
Hello,

After the upgrade to Release 6 I tried instead of upgrading CEPH to reinstall CEPH. I used a page which showed to delete several directories.
( rm -Rf /etc/ceph /etc/pve/ceph.conf /etc/pve/priv/ceph* /var/lib/ceph )

pveceph init --network 10.1.1.0/24 was working But afterwards I get following error:
pveceph createmon
unable to get monitor info from DNS SRV with service name: ceph-mon
Could not connect to ceph cluster despite configured monitors

Installation via GUI also fails.

Is there a way to reinstall CEPH so that I can fix the issue?

Danke Metz
 

brucexx

Active Member
Mar 19, 2015
191
5
38
Can confirm, after upgrading to PVE 6 from 5.4 (which was successful) I tried to upgrade Ceph which was not successful. I purged the Ceph config and tried to reinstall with nautilus, I made sure it is installed. It is failing with the same message. I even put all the nodes in the host table but it did not help.
 

Alwin

Proxmox Staff Member
Aug 1, 2017
4,617
451
88
pveceph init --network 10.1.1.0/24 was working But afterwards I get following error:
remove the /etc/pve/ceph.conf, as it will not be re-initialized once it was created.

EDIT:
After posting I have seen the brackets. :/

Are you on the latest packages (pveversion -v)? Can you please post your ceph.conf?
 
Last edited:
May 5, 2018
7
1
3
46
pveversion -v
Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

cat /etc/pve/ceph.conf
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.1.1.0/24
         fsid = 59cf47e3-19c8-4b4c-bea7-983c62ebbcdf
         mon_allow_pool_delete = true
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public network = 10.1.1.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring
 
May 5, 2018
7
1
3
46
Installed the latest Updates. Same result.

pveceph init --network 10.1.1.0/24 was working But afterwards I get following error:
pveceph createmon
Code:
unable to get monitor info from DNS SRV with service name: ceph-mon
Could not connect to ceph cluster despite configured monitors

cat /etc/pve/ceph.conf
Code:
[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 10.1.1.0/24
         fsid = 631a6e28-8e2d-4563-89d8-ac0043790a6f
         mon allow pool delete = true
         osd pool default min size = 2
         osd pool default size = 3
         public network = 10.1.1.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

pveversion -v
Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-6 (running version: 6.0-6/c71f879f)
pve-kernel-5.0: 6.0-7
pve-kernel-helper: 6.0-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.11-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2
 
May 5, 2018
7
1
3
46
I found following log entry after reboot of the node:
cat /var/log/ceph/ceph-mon.pve-node3.log
Code:
2019-09-06 20:55:17.263 7fd6ce7fa3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 20:55:17.263 7fd6ce7fa3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1695
2019-09-06 20:55:17.263 7fd6ce7fa3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?
2019-09-06 20:55:27.555 7f856c6883c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 20:55:27.555 7f856c6883c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1876
2019-09-06 20:55:27.555 7f856c6883c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?

After the creation of the directory:
Code:
2019-09-06 20:57:53.206 7fd067a7f3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 20:57:53.206 7fd067a7f3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 2068
2019-09-06 20:57:53.206 7fd067a7f3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' is empty: have you run 'mkfs'?

The log files on the other nodes are empty but I did not reboot them.
 
May 5, 2018
7
1
3
46
Did following steps:
Code:
pveceph purge    | on all nodes
rm -r /var/lib/ceph    | on all nodes
rm /etc/pve/ceph.conf
reboot of one node

Why does the log file still have entries and it looks like ceph will be started?

cat /var/log/ceph/ceph-mon.pve-node3.log

Code:
2019-09-06 23:24:06.667 7f69a14cc3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 Errors while parsing config file!
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 parse_file: cannot open /.ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2019-09-06 23:24:06.667 7f69a14cc3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1864
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?
2019-09-06 23:24:16.563 7f41e7cc93c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 Errors while parsing config file!
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 parse_file: cannot open /.ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2019-09-06 23:24:16.563 7f41e7cc93c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1909
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?
2019-09-06 23:24:26.747 7f683159d3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 23:24:26.747 7f683159d3c0 -1 Errors while parsing config file!
2019-09-06 23:24:26.747 7f683159d3c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:26.747 7f683159d3c0 -1 parse_file: cannot open /.ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:26.747 7f683159d3c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2019-09-06 23:24:26.747 7f683159d3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1972
2019-09-06 23:24:26.747 7f683159d3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?

Code:
ps -ef | grep ceph
root         986       1  0 23:23 ?        00:00:00 /usr/bin/python2.7 /usr/bin/ceph-crash
 

Einar Stenberg

Active Member
Mar 7, 2012
33
8
28
Gjøvik, Norway, Norway
Hi
I seem to be having the exact same problem as metz and brucexx.
Have tried purge and delete of /var/lib/ceph without success.

WHen i do pveceph init either via cli or the gui it errors out with "Could not connect to ceph cluster despite configured monitors (500) "
And it is left with "ghosts" of the monitors, visible in webgui but not in any cli, iie. cannot be removed stopped or started.

Is there a way to delete alle these references and start completly from scratch with the ceph install? (remove all config/servicees and put them back)
 
  • Like
Reactions: G0ldmember

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
8,062
993
163
34
Vienna
Is there a way to delete alle these references and start completly from scratch with the ceph install? (remove all config/servicees and put them back)
please check all nodes if there are any leftover directories (/var/lib/ceph/mon) and leftover services
/etc/systemd/system/ceph-mon.target.wants/

remove all of those and restart pvestatd with 'systemctl restart pvestatd'
 
May 5, 2018
7
1
3
46
Thanks. That helped me one step further. Now 3 monitors are running. 2 managers are configured and running. Not able to start the managers and not able to configure the 3rd manager.

Following Error Message: /var/lib/ceph/mgr/ceph-pve-node3/keyring.tmp.2235958

Now I've two weeks holiday. I think I will reinstall proxmox to have a clean setup. Because also I'm loosing corosync from time to time which I never had on Version 5 and also the gui shows a questionmark on some nodes from time to time but on the cli I see the node is up.

Code:
ls -l /var/lib/ceph/mon
ls: cannot access '/var/lib/ceph/mon': No such file or directory

ls /etc/systemd/system/ceph-mon.target.wants/
ceph-mon@pve-node3.service

systemctl disable ceph-mon@pve-node3.service
Removed /etc/systemd/system/ceph-mon.target.wants/ceph-mon@pve-node3.service.

ls /etc/systemd/system/ceph-mon.target.wants/
ls: cannot access '/etc/systemd/system/ceph-mon.target.wants/': No such file or directory

pveceph init --network 10.1.1.0/24
creating /etc/pve/priv/ceph.client.admin.keyring

pveceph createmon
unable to get monitor info from DNS SRV with service name: ceph-mon
creating /etc/pve/priv/ceph.mon.keyring
importing contents of /etc/pve/priv/ceph.client.admin.keyring into /etc/pve/priv/ceph.mon.keyring
chown: cannot access '/var/lib/ceph/mon/ceph-pve-node3': No such file or directory
error with cfs lock 'file-ceph_conf': command 'chown ceph:ceph /var/lib/ceph/mon/ceph-pve-node3' failed: exit code 1

mkdir -p /var/lib/ceph/mon

pveceph createmon
unable to get monitor info from DNS SRV with service name: ceph-mon
monmaptool: monmap file /tmp/monmap
monmaptool: generated fsid 4d72e5e9-4e59-4875-a27f-ff273a9c007f
epoch 0
fsid 4d72e5e9-4e59-4875-a27f-ff273a9c007f
last_changed 2019-09-14 11:36:10.961149
created 2019-09-14 11:36:10.961149
min_mon_release 0 (unknown)
0: [v2:10.9.9.13:3300/0,v1:10.9.9.13:6789/0] mon.pve-node3
monmaptool: writing epoch 0 to /tmp/monmap (1 monitors)
Created symlink /etc/systemd/system/ceph-mon.target.wants/ceph-mon@pve-node3.service -> /lib/systemd/system/ceph-mon@.service.
creating manager directory '/var/lib/ceph/mgr/ceph-pve-node3'
creating keys for 'mgr.pve-node3'
unable to open file '/var/lib/ceph/mgr/ceph-pve-node3/keyring.tmp.2235636' - No such file or directory

pveceph createmgr
creating manager directory '/var/lib/ceph/mgr/ceph-pve-node3'
creating keys for 'mgr.pve-node3'
unable to open file '/var/lib/ceph/mgr/ceph-pve-node3/keyring.tmp.2235958' - No such file or directory
 
  • Like
Reactions: G0ldmember

lynn_yudi

Active Member
Nov 27, 2011
86
0
26
with full new installed latest pve6, this problem remains.

first setup ceph with:
pveceph init --network 10.0.115.0/24 -disable_cephx 1
pveceph mon create
...
it's normal,

and delete all ceph to continue testing..
resetup new ceph cluster with:

pveceph init --network 10.0.115.0/24 -disable_cephx 1
pveceph mon create
unable to get monitor info from DNS SRV with service name: ceph-mon
...
it won't work!

now, i'm not use this parameter. it's all ok, and won't reinstall pve6 again!
 
Last edited:

Alwin

Proxmox Staff Member
Aug 1, 2017
4,617
451
88
@lynn_yudi, what is in the ceph config when the 'unable to get monitor info' shows up?
 

lynn_yudi

Active Member
Nov 27, 2011
86
0
26
@lynn_yudi, what is in the ceph config when the 'unable to get monitor info' shows up?

pveceph init --network 10.0.115.0/24 -disable_cephx 1

for ceph.conf
Bash:
# cat ceph.conf
[global]
         auth_client_required = none
         auth_cluster_required = none
         auth_service_required = none
         cluster_network = 10.0.115.0/24
         fsid = 554ee1fe-8a40-44bf-9c54-2db323aa89ea
         mon_allow_pool_delete = true
         mon_host = 10.0.115.15 10.0.115.11 10.0.115.13
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         osd_crush_update_on_start = false
         public_network = 10.0.115.0/24

and
# pveceph mon create
unable to get monitor info from DNS SRV with service name: ceph-mon

sorry, I forgot if it was this message(above)

but it's not going to work.
 

Alwin

Proxmox Staff Member
Aug 1, 2017
4,617
451
88
There seems to be some leftover. After you purged Ceph, is /var/lib/ceph/ empty? And is there no ceph.conf anymore?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!