We have uncovered a problem with Ceph pacific OSDs not always starting automatically after a node is restarted. This is relatively prevalent with nodes exhibiting a single OSD with this problem approximately 70% of the time. We had one occurrence where a node had two OSDs in this state, whilst handling a rolling upgrade on clusters this weekend.
The problem appears to be that the Linux kernel doesn't identify the partition information as being a Ceph data volume and subsequently don't set the ownership permissions on the device node properly. Manually updating permissions and then restarting the OSD service subsequently results in the device being identified properly thereafter.
ie: blkid /dev/sdb2 would not return anything when in a problem state. Once the OSD boots the same command returns the expected information.
Permission denied is due to 'blkid /dev/sdb2' not working, so Ceph block parrtition is not identified:
Full OSD log, should it be relevant, attached to this post:
[admin@kvm5c ~]# tail -f /var/log/ceph/ceph-osd.32.log
The problem appears to be that the Linux kernel doesn't identify the partition information as being a Ceph data volume and subsequently don't set the ownership permissions on the device node properly. Manually updating permissions and then restarting the OSD service subsequently results in the device being identified properly thereafter.
ie: blkid /dev/sdb2 would not return anything when in a problem state. Once the OSD boots the same command returns the expected information.
Code:
[admin@kvm5c ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
1010 nvme 2.91089 1.00000 2.9 TiB 1.5 GiB 854 MiB 424 MiB 237 MiB 2.9 TiB 0.05 0.00 73 up
10 ssd 5.79149 1.00000 5.8 TiB 2.5 TiB 2.5 TiB 314 MiB 6.9 GiB 3.3 TiB 42.43 1.01 66 up
11 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 297 MiB 7.5 GiB 3.0 TiB 47.44 1.13 74 up
12 ssd 5.82179 1.00000 5.8 TiB 2.3 TiB 2.3 TiB 111 MiB 7.3 GiB 3.5 TiB 40.09 0.95 65 up
1020 nvme 2.91089 1.00000 2.9 TiB 1.3 GiB 676 MiB 353 MiB 284 MiB 2.9 TiB 0.04 0.00 64 up
20 ssd 5.79149 1.00000 5.8 TiB 2.6 TiB 2.6 TiB 204 MiB 6.6 GiB 3.2 TiB 44.79 1.06 72 up
21 ssd 5.79149 1.00000 5.8 TiB 2.6 TiB 2.6 TiB 134 MiB 6.7 GiB 3.2 TiB 45.19 1.07 70 up
22 ssd 5.82179 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 119 MiB 8.1 GiB 3.1 TiB 45.93 1.09 71 up
1030 nvme 2.91089 1.00000 2.9 TiB 1.4 GiB 860 MiB 388 MiB 151 MiB 2.9 TiB 0.05 0.00 70 up
30 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.6 TiB 231 MiB 7.7 GiB 3.1 TiB 45.83 1.09 59 up
31 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.6 TiB 220 MiB 7.5 GiB 3.1 TiB 45.87 1.09 46 up
32 ssd 5.82179 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 118 MiB 8.7 GiB 3.1 TiB 46.63 1.11 0 down
1040 nvme 2.91089 1.00000 2.9 TiB 1.5 GiB 828 MiB 375 MiB 317 MiB 2.9 TiB 0.05 0.00 73 up
40 ssd 5.79149 1.00000 5.8 TiB 3.2 TiB 3.2 TiB 214 MiB 8.5 GiB 2.6 TiB 55.68 1.32 78 up
41 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 116 MiB 8.1 GiB 3.1 TiB 45.96 1.09 66 up
42 ssd 5.82179 1.00000 5.8 TiB 2.6 TiB 2.6 TiB 120 MiB 8.2 GiB 3.2 TiB 45.35 1.08 70 up
1050 nvme 2.91089 1.00000 2.9 TiB 1.5 GiB 870 MiB 403 MiB 309 MiB 2.9 TiB 0.05 0.00 82 up
50 ssd 5.79149 1.00000 5.8 TiB 2.8 TiB 2.8 TiB 221 MiB 7.6 GiB 3.0 TiB 48.12 1.14 75 up
51 ssd 5.79149 1.00000 5.8 TiB 2.9 TiB 2.9 TiB 225 MiB 8.2 GiB 2.9 TiB 50.41 1.20 76 up
52 ssd 5.82179 1.00000 5.8 TiB 2.8 TiB 2.8 TiB 126 MiB 8.9 GiB 3.0 TiB 48.15 1.14 74 up
1060 nvme 2.91089 1.00000 2.9 TiB 1.2 GiB 768 MiB 409 MiB 98 MiB 2.9 TiB 0.04 0 73 up
60 ssd 5.79149 1.00000 5.8 TiB 2.8 TiB 2.8 TiB 236 MiB 8.0 GiB 3.0 TiB 48.70 1.16 72 up
61 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 230 MiB 7.5 GiB 3.1 TiB 46.72 1.11 70 up
62 ssd 5.82179 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 122 MiB 8.5 GiB 3.1 TiB 45.99 1.09 71 up
70 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 95 MiB 8.0 GiB 3.1 TiB 46.09 1.10 68 up
71 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 114 MiB 7.9 GiB 3.1 TiB 46.94 1.12 69 up
72 ssd 5.82179 1.00000 5.8 TiB 2.5 TiB 2.5 TiB 116 MiB 8.4 GiB 3.3 TiB 43.61 1.04 68 up
80 ssd 5.79149 1.00000 5.8 TiB 2.6 TiB 2.6 TiB 105 MiB 7.2 GiB 3.2 TiB 45.54 1.08 67 up
81 ssd 5.79149 1.00000 5.8 TiB 2.6 TiB 2.6 TiB 100 MiB 7.3 GiB 3.2 TiB 44.53 1.06 67 up
82 ssd 5.82179 1.00000 5.8 TiB 2.4 TiB 2.4 TiB 119 MiB 7.8 GiB 3.4 TiB 40.93 0.97 63 up
90 ssd 5.79149 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 75 MiB 7.2 GiB 3.1 TiB 46.76 1.11 71 up
91 ssd 5.79149 1.00000 5.8 TiB 2.6 TiB 2.6 TiB 90 MiB 7.2 GiB 3.2 TiB 45.01 1.07 67 up
92 ssd 5.82179 1.00000 5.8 TiB 2.7 TiB 2.7 TiB 99 MiB 8.4 GiB 3.1 TiB 46.23 1.10 71 up
100 ssd 5.79149 1.00000 5.8 TiB 2.5 TiB 2.5 TiB 119 MiB 8.2 GiB 3.3 TiB 43.48 1.03 70 up
101 ssd 5.79149 1.00000 5.8 TiB 2.5 TiB 2.5 TiB 109 MiB 7.9 GiB 3.3 TiB 43.20 1.03 70 up
102 ssd 5.82179 1.00000 5.8 TiB 2.8 TiB 2.8 TiB 95 MiB 8.7 GiB 3.0 TiB 48.90 1.16 71 up
110 ssd 5.79149 1.00000 5.8 TiB 2.5 TiB 2.5 TiB 112 MiB 7.8 GiB 3.3 TiB 43.18 1.03 65 up
111 ssd 5.79149 1.00000 5.8 TiB 2.5 TiB 2.5 TiB 115 MiB 7.6 GiB 3.3 TiB 43.21 1.03 68 up
112 ssd 5.82179 1.00000 5.8 TiB 2.8 TiB 2.8 TiB 82 MiB 8.4 GiB 3.0 TiB 48.19 1.15 75 up
TOTAL 209 TiB 88 TiB 88 TiB 7.1 GiB 260 GiB 121 TiB 42.08
MIN/MAX VAR: 0/1.32 STDDEV: 17.06
[admin@kvm5c ~]# df -h
Filesystem Size Used Avail Use% Mounted on
udev 315G 0 315G 0% /dev
tmpfs 63G 3.4M 63G 1% /run
/dev/md0 30G 7.0G 22G 25% /
tmpfs 315G 57M 315G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda4 94M 5.5M 89M 6% /var/lib/ceph/osd/ceph-30
/dev/sdb1 94M 5.5M 89M 6% /var/lib/ceph/osd/ceph-32
/dev/nvme0n1p1 94M 5.5M 89M 6% /var/lib/ceph/osd/ceph-1030
/dev/sdc4 94M 5.5M 89M 6% /var/lib/ceph/osd/ceph-31
/dev/fuse 128M 356K 128M 1% /etc/pve
10.254.1.3,10.254.1.4,10.254.1.5:/ 26T 41G 26T 1% /mnt/pve/cephfs
[admin@kvm5c ~]# tail -f /var/log/ceph/ceph-osd.32.log
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 -1 bdev(0x55f155ce4400 /var/lib/ceph/osd/ceph-32/block) open open got: (13) Permission denied
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 0 osd.32:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 -1 bluestore(/var/lib/ceph/osd/ceph-32/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-32/block: (13) Permission denied
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 1 bluestore(/var/lib/ceph/osd/ceph-32) _mount path /var/lib/ceph/osd/ceph-32
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 0 bluestore(/var/lib/ceph/osd/ceph-32) _open_db_and_around read-only:0 repair:0
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 -1 bluestore(/var/lib/ceph/osd/ceph-32/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-32/block: (13) Permission denied
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 1 bdev(0x55f155ce4400 /var/lib/ceph/osd/ceph-32/block) open path /var/lib/ceph/osd/ceph-32/block
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 -1 bdev(0x55f155ce4400 /var/lib/ceph/osd/ceph-32/block) open open got: (13) Permission denied
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 -1 osd.32 0 OSD:init: unable to mount object store
2021-11-27T08:57:08.883+0200 7fc9afd2ff00 -1 ** ERROR: osd init failed: (13) Permission denied
^C
Permission denied is due to 'blkid /dev/sdb2' not working, so Ceph block parrtition is not identified:
Code:
[admin@kvm5c ~]# dir /dev/sdb2
brw-rw---- 1 admin disk 8, 18 Nov 27 08:56 /dev/sdb2
[admin@kvm5c ~]# blkid /dev/sdb1
/dev/sdb1: UUID="26dfb0f4-d670-4a39-a1db-6ecb66fdc025" BLOCK_SIZE="4096" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="dbe970e0-c8d9-4365-b189-c60fa206e4a0"
[admin@kvm5c ~]# blkid /dev/sdb2
[admin@kvm5c ~]# chown ceph.ceph /dev/sdb2
[admin@kvm5c ~]# systemctl reset-failed; systemctl restart ceph-osd@32
# The following only updates once the OSD is back up again:
# [admin@kvm5c ~]# blkid /dev/sdb2
# /dev/sdb2: TYPE="ceph_bluestore" PARTLABEL="ceph block" PARTUUID="fa72ae1e-168a-4acc-934a-5edef24c9b06"
Full OSD log, should it be relevant, attached to this post:
[admin@kvm5c ~]# tail -f /var/log/ceph/ceph-osd.32.log
Attachments
Last edited: