Failed to start Ceph disk activation: /dev/sd* and OSD's down after Proxmox upgrade to v6

hereby some error from the ceph-osd.0.log:

2019-10-25 15:16:32.695452 7f9063aa4e80 0 _get_class not permitted to load sdk
2019-10-25 15:16:32.695624 7f9063aa4e80 0 <cls> /root/sources/pve/ceph/ceph-12.2.12/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs
2019-10-25 15:16:32.695764 7f9063aa4e80 0 <cls> /root/sources/pve/ceph/ceph-12.2.12/src/cls/hello/cls_hello.cc:296: loading cls_hello
2019-10-25 15:16:32.701201 7f9063aa4e80 0 _get_class not permitted to load lua
2019-10-25 15:16:32.701542 7f9063aa4e80 0 _get_class not permitted to load kvs
2019-10-25 15:16:32.701565 7f9063aa4e80 1 osd.0 0 warning: got an error loading one or more classes: (1) Operation not permitted
2019-10-25 15:16:32.702700 7f9063aa4e80 0 osd.0 15510 crush map has features 288514051259236352, adjusting msgr requires for clients
2019-10-25 15:16:32.702709 7f9063aa4e80 0 osd.0 15510 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons
2019-10-25 15:16:32.702713 7f9063aa4e80 0 osd.0 15510 crush map has features 1009089991638532096, adjusting msgr requires for osds
2019-10-25 15:16:34.251255 7f9063aa4e80 0 osd.0 15510 load_pgs
2019-10-25 15:16:37.165065 7f9063aa4e80 0 osd.0 15510 load_pgs opened 86 pgs
2019-10-25 15:16:37.165211 7f9063aa4e80 0 osd.0 15510 using weightedpriority op queue with priority op cut off at 64.
2019-10-25 15:16:37.166854 7f9063aa4e80 -1 osd.0 15510 log_to_monitors {default=true}
 
2019-10-25 15:16:32.701565 7f9063aa4e80 1 osd.0 0 warning: got an error loading one or more classes: (1) Operation not permitted
Check if the keyring of the OSDs exist and has the below content.
Code:
:~# cat /var/lib/ceph/osd/ceph-0/keyring
[osd.0]
key = <retracted>

And please post your ceph.conf.
 
Yes the keyring(s) exist.

cat ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.0.1.0/24
fsid = 09935360-cfe7-48d4-ac76-c02e0fdd95de
mon allow pool delete = true
mon_host = 10.0.1.2 10.0.1.3 10.0.1.4
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.0.1.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.node002]
host = node002
mon addr = 10.0.1.2:6789

[mon.node003]
host = node003
mon addr = 10.0.1.3:6789

[mon.node004]
host = node004
mon addr = 10.0.1.4:6789
 
I MAYBE found something, this bond looks DOWN:

bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether d2:6e:67:5f:24:71 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.2/24 brd 10.0.1.255 scope global bond0
 
inet 10.0.1.2/24 brd 10.0.1.255 scope global bond0
Well, this would explain why there aren't any services visible but running. You should even see different ceph -s, depending on the node you are connected to.
 
Ok I found the exact problem and that is after the upgrade to Proxmox 6 Debian renamed my network interface. I had to change to old names into the new ones (ens2f0 ens2f1) in /etc/network/interfaces, restarted the network interfaces and everything is up and running on the latest 5 kernel.
 
  • Like
Reactions: normic

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!