Proxmox 5, Ceph Luminous observations and notes

alexskysilk

Distinguished Member
Oct 16, 2015
2,586
854
213
Chatsworth, CA
www.skysilk.com
apologies if this belongs in a different forum. I set up a cluster using Proxmox5/stretch + Ceph 12 in the lab. Here are some observations that may be useful for UX purposes:

1. the default rbd pool was always a needless nuisance but was always easy to delete- but with Luminous the default behavior is to deny pool deletion. This is generally the correct behavior but it creates a UX problem where there is a default, useless pool created at pveceph install and cannot be removed. This will also affect deletion of any pools, so the global variable for pool deletion (
mon_allow_pool_delete) should either be set OR provide an alternative, one time method to do so via the GUI.
2. creating OSDs is a very iffy proposition; the process completes well enough from either the GUI or CLI, but it does not add them to the crush map. I got it to add the OSDs from one node ONCE; other nodes did not- they're not even showing up as available hosts in the crush map. I am following the normal process of ceph-disk zap followed by pveceph createosd. The OSD creation does appear in the tasks logs and completes successfully, but does not mount or create a process.

pveversion -v
proxmox-ve: 5.0-6 (running kernel: 4.10.8-1-pve)
pve-manager: 5.0-9 (running version: 5.0-9/c7bdd872)
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.10.8-1-pve: 4.10.8-6
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.16-1-pve: 4.4.16-64
libpve-http-server-perl: 2.0-2
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve2
libqb0: 1.0.1-1
pve-cluster: 5.0-4
qemu-server: 5.0-4
pve-firmware: 2.0-2
libpve-common-perl: 5.0-8
libpve-guest-common-perl: 2.0-1
libpve-access-control: 5.0-3
libpve-storage-perl: 5.0-3
pve-libspice-server1: 0.12.8-3
vncterm: 1.4-1
pve-docs: 5.0-1
pve-qemu-kvm: 2.9.0-1
pve-container: 2.0-6
pve-firewall: 3.0-1
pve-ha-manager: 2.0-1
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.7-500
lxcfs: 2.0.6-pve500
criu: 2.11.1-1~bpo90
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
ceph: 12.0.1-pve1​

pveceph status attached.
 

Attachments

apologies if this belongs in a different forum. I set up a cluster using Proxmox5/stretch + Ceph 12 in the lab. Here are some observations that may be useful for UX purposes:

1. the default rbd pool was always a needless nuisance but was always easy to delete- but with Luminous the default behavior is to deny pool deletion. This is generally the correct behavior but it creates a UX problem where there is a default, useless pool created at pveceph install and cannot be removed. This will also affect deletion of any pools, so the global variable for pool deletion (
mon_allow_pool_delete) should either be set OR provide an alternative, one time method to do so via the GUI.

there is a proposal on pve-devel to enable pool deletion by default (when Ceph is setup using "pveceph init").

2. creating OSDs is a very iffy proposition; the process completes well enough from either the GUI or CLI, but it does not add them to the crush map. I got it to add the OSDs from one node ONCE; other nodes did not- they're not even showing up as available hosts in the crush map. I am following the normal process of ceph-disk zap followed by pveceph createosd. The OSD creation does appear in the tasks logs and completes successfully, but does not mount or create a process.

could you post the complete journal output from such a failed osd creation? I think there is a problem on some systems because udev already tries to activate the OSD before it is done initializing..

pveversion -v
proxmox-ve: 5.0-6 (running kernel: 4.10.8-1-pve)
..
ceph: 12.0.1-pve1​

there'll be 12.0.2 packages shortly, if you want to test those ;)
 
be glad to. how do I do that when the OSD isnt mounted?

Code:
journalctl --since "timestamp one minute before pveceph createosd" --until "timestamp 3 minutes after pveceph createosd"

where timestamps are "2017-05-03 08:30:00" or "2017-05-03 08:30". if you redo it "now", you can also do something like
Code:
journalctl --since "-5m"
to get the last 5 minutes of logs ;)
 
It does sound strange, but it happens every time I use ceph-disk zap. it doesnt appear to affect anything. Parted doesnt show anything wrong at all:

parted /dev/sdc
GNU Parted 3.2
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: ATA MM1000GBKAL (scsi)
Disk /dev/sdc: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
2 1049kB 5370MB 5369MB ceph journal
1 5370MB 1000GB 995GB xfs ceph data
 
the log indicates that something is not right with your bootstrap keys:

Code:
May 04 11:03:35 pve22 sh[1067736]: ceph_disk.main.Error: Error: ceph osd create failed: Command '/usr/bin/ceph' returned non-zero exit status 1: 2017-05-04 11:03:35.471823 7f71f4c9e700  0 librados: client.bootstrap-osd authentication error (1) Operation not permitted

I assume this is a test cluster?

If so, what does "ceph auth list | grep -v key" output?
does the file /var/lib/ceph/bootstrap-osd/ceph.keyring exist? what permissions does it have?
do "ceph auth get client.bootstrap-osd" and "ceph-authtool -l /var/lib/ceph/bootstrap-osd/ceph.keyring" print the same key? (no need to post the output here, just whether they are the same!)
 
If so, what does "ceph auth list | grep -v key" output?
installed auth entries:

osd.0
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
caps: [mon] allow profile osd
caps: [osd] allow *
osd.2
caps: [mon] allow profile osd
caps: [osd] allow *
osd.3
caps: [mon] allow profile osd
caps: [osd] allow *
osd.4
caps: [mon] allow rwx

caps: [osd] allow *
client.admin
auid: 0
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
caps: [mon] allow profile bootstrap-osd
client.bootstrap-rgw
caps: [mon] allow profile bootstrap-rgw
mgr.0
caps: [mon] allow *
mgr.1
caps: [mon] allow *
mgr.2
caps: [mon] allow *
mgr.3
caps: [mon] allow *

does the file /var/lib/ceph/bootstrap-osd/ceph.keyring exist? what permissions does it have?
it does.
-rw-r--r-- 1 ceph ceph 113 Dec 14 2015 ceph.keyring
do "ceph auth get client.bootstrap-osd" and "ceph-authtool -l /var/lib/ceph/bootstrap-osd/ceph.keyring" print the same key?
Interesting. they do not. I should note that these nodes were upgraded from 4.4 and I did a pveceph purge before reinstalling ceph...
 
I went ahead and purged the config, then manually deleted /var/lib/ceph/* on each participating node. problem went away.

I think "pveceph purge" needs to be more aggresive (maybe with a --aggresive switch) and kill node related ceph settings on purge.