Reinstall CEPH on Proxmox 6

Metz · Sep 5, 2019

Hello,

After the upgrade to Release 6 I tried instead of upgrading CEPH to reinstall CEPH. I used a page which showed to delete several directories.
( rm -Rf /etc/ceph /etc/pve/ceph.conf /etc/pve/priv/ceph* /var/lib/ceph )

pveceph init --network 10.1.1.0/24 was working But afterwards I get following error:
pveceph createmon
unable to get monitor info from DNS SRV with service name: ceph-mon
Could not connect to ceph cluster despite configured monitors

Installation via GUI also fails.

Is there a way to reinstall CEPH so that I can fix the issue?

Danke Metz

brucexx · Sep 6, 2019

Can confirm, after upgrading to PVE 6 from 5.4 (which was successful) I tried to upgrade Ceph which was not successful. I purged the Ceph config and tried to reinstall with nautilus, I made sure it is installed. It is failing with the same message. I even put all the nodes in the host table but it did not help.

Alwin · Sep 6, 2019

Metz said:
pveceph init --network 10.1.1.0/24 was working But afterwards I get following error:

remove the /etc/pve/ceph.conf, as it will not be re-initialized once it was created.

EDIT:
After posting I have seen the brackets. :/

Are you on the latest packages (pveversion -v)? Can you please post your ceph.conf?

Metz · Sep 6, 2019

pveversion -v

Code:

proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

cat /etc/pve/ceph.conf

Code:

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.1.1.0/24
         fsid = 59cf47e3-19c8-4b4c-bea7-983c62ebbcdf
         mon_allow_pool_delete = true
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public network = 10.1.1.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

Metz · Sep 6, 2019

Installed the latest Updates. Same result.

pveceph init --network 10.1.1.0/24 was working But afterwards I get following error:
pveceph createmon

Code:

unable to get monitor info from DNS SRV with service name: ceph-mon
Could not connect to ceph cluster despite configured monitors

cat /etc/pve/ceph.conf

Code:

[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 10.1.1.0/24
         fsid = 631a6e28-8e2d-4563-89d8-ac0043790a6f
         mon allow pool delete = true
         osd pool default min size = 2
         osd pool default size = 3
         public network = 10.1.1.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

pveversion -v

Code:

proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-6 (running version: 6.0-6/c71f879f)
pve-kernel-5.0: 6.0-7
pve-kernel-helper: 6.0-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.11-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2

Metz · Sep 6, 2019

I found following log entry after reboot of the node:
cat /var/log/ceph/ceph-mon.pve-node3.log

Code:

2019-09-06 20:55:17.263 7fd6ce7fa3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 20:55:17.263 7fd6ce7fa3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1695
2019-09-06 20:55:17.263 7fd6ce7fa3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?
2019-09-06 20:55:27.555 7f856c6883c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 20:55:27.555 7f856c6883c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1876
2019-09-06 20:55:27.555 7f856c6883c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?

After the creation of the directory:

Code:

2019-09-06 20:57:53.206 7fd067a7f3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 20:57:53.206 7fd067a7f3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 2068
2019-09-06 20:57:53.206 7fd067a7f3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' is empty: have you run 'mkfs'?

The log files on the other nodes are empty but I did not reboot them.

Metz · Sep 6, 2019

Did following steps:

Code:

pveceph purge    | on all nodes
rm -r /var/lib/ceph    | on all nodes
rm /etc/pve/ceph.conf
reboot of one node

Why does the log file still have entries and it looks like ceph will be started?

cat /var/log/ceph/ceph-mon.pve-node3.log

Code:

2019-09-06 23:24:06.667 7f69a14cc3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 Errors while parsing config file!
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 parse_file: cannot open /.ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2019-09-06 23:24:06.667 7f69a14cc3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1864
2019-09-06 23:24:06.667 7f69a14cc3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?
2019-09-06 23:24:16.563 7f41e7cc93c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 Errors while parsing config file!
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 parse_file: cannot open /.ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2019-09-06 23:24:16.563 7f41e7cc93c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1909
2019-09-06 23:24:16.563 7f41e7cc93c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?
2019-09-06 23:24:26.747 7f683159d3c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-06 23:24:26.747 7f683159d3c0 -1 Errors while parsing config file!
2019-09-06 23:24:26.747 7f683159d3c0 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:26.747 7f683159d3c0 -1 parse_file: cannot open /.ceph/ceph.conf: (2) No such file or directory
2019-09-06 23:24:26.747 7f683159d3c0 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2019-09-06 23:24:26.747 7f683159d3c0  0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mon, pid 1972
2019-09-06 23:24:26.747 7f683159d3c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve-node3' does not exist: have you run 'mkfs'?

Code:

ps -ef | grep ceph
root         986       1  0 23:23 ?        00:00:00 /usr/bin/python2.7 /usr/bin/ceph-crash

Einar Stenberg · Sep 11, 2019

Hi
I seem to be having the exact same problem as metz and brucexx.
Have tried purge and delete of /var/lib/ceph without success.

WHen i do pveceph init either via cli or the gui it errors out with "Could not connect to ceph cluster despite configured monitors (500) "
And it is left with "ghosts" of the monitors, visible in webgui but not in any cli, iie. cannot be removed stopped or started.

Is there a way to delete alle these references and start completly from scratch with the ceph install? (remove all config/servicees and put them back)

dcsapak · Sep 12, 2019

Einar Stenberg said:
Is there a way to delete alle these references and start completly from scratch with the ceph install? (remove all config/servicees and put them back)

please check all nodes if there are any leftover directories (/var/lib/ceph/mon) and leftover services
/etc/systemd/system/ceph-mon.target.wants/

remove all of those and restart pvestatd with 'systemctl restart pvestatd'

Metz · Sep 14, 2019

Thanks. That helped me one step further. Now 3 monitors are running. 2 managers are configured and running. Not able to start the managers and not able to configure the 3rd manager.

Following Error Message: /var/lib/ceph/mgr/ceph-pve-node3/keyring.tmp.2235958

Now I've two weeks holiday. I think I will reinstall proxmox to have a clean setup. Because also I'm loosing corosync from time to time which I never had on Version 5 and also the gui shows a questionmark on some nodes from time to time but on the cli I see the node is up.

Code:

ls -l /var/lib/ceph/mon
ls: cannot access '/var/lib/ceph/mon': No such file or directory

ls /etc/systemd/system/ceph-mon.target.wants/
ceph-mon@pve-node3.service

systemctl disable ceph-mon@pve-node3.service
Removed /etc/systemd/system/ceph-mon.target.wants/ceph-mon@pve-node3.service.

ls /etc/systemd/system/ceph-mon.target.wants/
ls: cannot access '/etc/systemd/system/ceph-mon.target.wants/': No such file or directory

pveceph init --network 10.1.1.0/24
creating /etc/pve/priv/ceph.client.admin.keyring

pveceph createmon
unable to get monitor info from DNS SRV with service name: ceph-mon
creating /etc/pve/priv/ceph.mon.keyring
importing contents of /etc/pve/priv/ceph.client.admin.keyring into /etc/pve/priv/ceph.mon.keyring
chown: cannot access '/var/lib/ceph/mon/ceph-pve-node3': No such file or directory
error with cfs lock 'file-ceph_conf': command 'chown ceph:ceph /var/lib/ceph/mon/ceph-pve-node3' failed: exit code 1

mkdir -p /var/lib/ceph/mon

pveceph createmon
unable to get monitor info from DNS SRV with service name: ceph-mon
monmaptool: monmap file /tmp/monmap
monmaptool: generated fsid 4d72e5e9-4e59-4875-a27f-ff273a9c007f
epoch 0
fsid 4d72e5e9-4e59-4875-a27f-ff273a9c007f
last_changed 2019-09-14 11:36:10.961149
created 2019-09-14 11:36:10.961149
min_mon_release 0 (unknown)
0: [v2:10.9.9.13:3300/0,v1:10.9.9.13:6789/0] mon.pve-node3
monmaptool: writing epoch 0 to /tmp/monmap (1 monitors)
Created symlink /etc/systemd/system/ceph-mon.target.wants/ceph-mon@pve-node3.service -> /lib/systemd/system/ceph-mon@.service.
creating manager directory '/var/lib/ceph/mgr/ceph-pve-node3'
creating keys for 'mgr.pve-node3'
unable to open file '/var/lib/ceph/mgr/ceph-pve-node3/keyring.tmp.2235636' - No such file or directory

pveceph createmgr
creating manager directory '/var/lib/ceph/mgr/ceph-pve-node3'
creating keys for 'mgr.pve-node3'
unable to open file '/var/lib/ceph/mgr/ceph-pve-node3/keyring.tmp.2235958' - No such file or directory

G0ldmember · Oct 9, 2019

@Metz same problem here. Did you manage to resolve it?

Jeff Wadsworth · Oct 9, 2019

Hello, just in case you have to reinstall, you may find this "rollback" setup useful for getting back to your previous state.

https://forum.proxmox.com/threads/using-zfs-snapshots-on-rpool-root-pve-1.27530/

Lephisto · Nov 29, 2019

Is there any news on this how to get Ceph to a sane state without reinstalling everything?

lynn_yudi · Feb 20, 2020

with -disable_cephx 1 , it's a bug!

dropndestroy · Feb 21, 2020

lynn_yudi said:
with -disable_cephx 1 , it's a bug!

Where can I find more information about this? I'm totally at a loss of how to get ceph working from scratch with the latest PVE (pve-no-subscription, apt update && apt dist-upgrade)

lynn_yudi · Feb 21, 2020

with full new installed latest pve6, this problem remains.

first setup ceph with:
pveceph init --network 10.0.115.0/24 -disable_cephx 1
pveceph mon create
...
it's normal,

and delete all ceph to continue testing..
resetup new ceph cluster with:

pveceph init --network 10.0.115.0/24 -disable_cephx 1
pveceph mon create
unable to get monitor info from DNS SRV with service name: ceph-mon
...
it won't work!

now, i'm not use this parameter. it's all ok, and won't reinstall pve6 again!

Alwin · Feb 21, 2020

@lynn_yudi, what is in the ceph config when the 'unable to get monitor info' shows up?

Alwin · Feb 21, 2020

@dropndestroy, see our docs for the Ceph installation.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

lynn_yudi · Feb 21, 2020

Alwin said:
@lynn_yudi, what is in the ceph config when the 'unable to get monitor info' shows up?

pveceph init --network 10.0.115.0/24 -disable_cephx 1

for ceph.conf

Bash:

# cat ceph.conf
[global]
         auth_client_required = none
         auth_cluster_required = none
         auth_service_required = none
         cluster_network = 10.0.115.0/24
         fsid = 554ee1fe-8a40-44bf-9c54-2db323aa89ea
         mon_allow_pool_delete = true
         mon_host = 10.0.115.15 10.0.115.11 10.0.115.13
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         osd_crush_update_on_start = false
         public_network = 10.0.115.0/24

and
# pveceph mon create
unable to get monitor info from DNS SRV with service name: ceph-mon

sorry, I forgot if it was this message(above)

but it's not going to work.

Alwin · Feb 21, 2020

There seems to be some leftover. After you purged Ceph, is /var/lib/ceph/ empty? And is there no ceph.conf anymore?

Reinstall CEPH on Proxmox 6

Active Member

Renowned Member

Proxmox Retired Staff

Active Member

Active Member

Active Member

Active Member

Renowned Member

Proxmox Staff Member

Active Member

Active Member

Member

Well-Known Member

Renowned Member

New Member

Renowned Member

Proxmox Retired Staff

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

We value your privacy