Failed to start Ceph disk activation: /dev/sd* and OSD's down after Proxmox upgrade to v6

hereby some error from the ceph-osd.0.log:

2019-10-25 15:16:32.695452 7f9063aa4e80 0 _get_class not permitted to load sdk
2019-10-25 15:16:32.695624 7f9063aa4e80 0 <cls> /root/sources/pve/ceph/ceph-12.2.12/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs
2019-10-25 15:16:32.695764 7f9063aa4e80 0 <cls> /root/sources/pve/ceph/ceph-12.2.12/src/cls/hello/cls_hello.cc:296: loading cls_hello
2019-10-25 15:16:32.701201 7f9063aa4e80 0 _get_class not permitted to load lua
2019-10-25 15:16:32.701542 7f9063aa4e80 0 _get_class not permitted to load kvs
2019-10-25 15:16:32.701565 7f9063aa4e80 1 osd.0 0 warning: got an error loading one or more classes: (1) Operation not permitted
2019-10-25 15:16:32.702700 7f9063aa4e80 0 osd.0 15510 crush map has features 288514051259236352, adjusting msgr requires for clients
2019-10-25 15:16:32.702709 7f9063aa4e80 0 osd.0 15510 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons
2019-10-25 15:16:32.702713 7f9063aa4e80 0 osd.0 15510 crush map has features 1009089991638532096, adjusting msgr requires for osds
2019-10-25 15:16:34.251255 7f9063aa4e80 0 osd.0 15510 load_pgs
2019-10-25 15:16:37.165065 7f9063aa4e80 0 osd.0 15510 load_pgs opened 86 pgs
2019-10-25 15:16:37.165211 7f9063aa4e80 0 osd.0 15510 using weightedpriority op queue with priority op cut off at 64.
2019-10-25 15:16:37.166854 7f9063aa4e80 -1 osd.0 15510 log_to_monitors {default=true}
 
2019-10-25 15:16:32.701565 7f9063aa4e80 1 osd.0 0 warning: got an error loading one or more classes: (1) Operation not permitted
Check if the keyring of the OSDs exist and has the below content.
Code:
:~# cat /var/lib/ceph/osd/ceph-0/keyring
[osd.0]
key = <retracted>

And please post your ceph.conf.
 
Yes the keyring(s) exist.

cat ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.0.1.0/24
fsid = 09935360-cfe7-48d4-ac76-c02e0fdd95de
mon allow pool delete = true
mon_host = 10.0.1.2 10.0.1.3 10.0.1.4
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.0.1.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.node002]
host = node002
mon addr = 10.0.1.2:6789

[mon.node003]
host = node003
mon addr = 10.0.1.3:6789

[mon.node004]
host = node004
mon addr = 10.0.1.4:6789
 
I MAYBE found something, this bond looks DOWN:

bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether d2:6e:67:5f:24:71 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.2/24 brd 10.0.1.255 scope global bond0
 
inet 10.0.1.2/24 brd 10.0.1.255 scope global bond0
Well, this would explain why there aren't any services visible but running. You should even see different ceph -s, depending on the node you are connected to.
 
Ok I found the exact problem and that is after the upgrade to Proxmox 6 Debian renamed my network interface. I had to change to old names into the new ones (ens2f0 ens2f1) in /etc/network/interfaces, restarted the network interfaces and everything is up and running on the latest 5 kernel.
 
  • Like
Reactions: normic