I can't add OSD's

Sep 12, 2020
13
0
21
27
Hello,

We have a problem that we can't add an OSD to CEPH. Error messages:

tderr: 2020-09-12 16:03:16.675 7ff0bb288700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2020-09-12 16:03:16.675 7ff0bb288700 -1 AuthRegistry(0x7ff0b40817b8) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: got monmap epoch 3

We have 7 nodes cluster. The node7pve running with pve 6.2.10 , the other nodes installed with pve 6.1.8

Some info:

root@node1pve6:~# ceph -s
cluster:
id: 30a0178f-b63f-4526-914d-e0d7067de313
health: HEALTH_OK

services:
mon: 3 daemons, quorum node1pve6,node2pve6,node3pve6 (age 9w)
mgr: node2pve6(active, since 5M), standbys: node3pve6, node1pve6
osd: 24 osds: 21 up (since 4M), 21 in (since 5M)

data:
pools: 1 pools, 256 pgs
objects: 2.66M objects, 10 TiB
usage: 30 TiB used, 13 TiB / 44 TiB avail
pgs: 256 active+clean

io:
client: 336 KiB/s rd, 9.7 MiB/s wr, 46 op/s rd, 787 op/s wr

root@node1pve6:~#



root@node1pve6:~# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 43.66328 - 44 TiB 30 TiB 30 TiB 3.9 MiB 66 GiB 13 TiB 69.29 1.00 - root default
-7 10.91574 - 11 TiB 7.9 TiB 7.9 TiB 96 KiB 17 GiB 3.0 TiB 72.15 1.04 - host node1pve6
8 ssd 1.81929 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 16 KiB 3.1 GiB 490 GiB 73.69 1.06 34 up osd.8
9 ssd 1.81929 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 20 KiB 2.5 GiB 652 GiB 65.02 0.94 30 up osd.9
10 ssd 1.81929 1.00000 1.8 TiB 1.4 TiB 1.4 TiB 16 KiB 3.0 GiB 452 GiB 75.73 1.09 35 up osd.10
11 ssd 1.81929 1.00000 1.8 TiB 1010 GiB 1008 GiB 20 KiB 2.0 GiB 853 GiB 54.19 0.78 25 up osd.11
12 ssd 1.81929 0.90002 1.8 TiB 1.5 TiB 1.5 TiB 12 KiB 3.7 GiB 333 GiB 82.12 1.19 38 up osd.12
13 ssd 1.81929 0.80005 1.8 TiB 1.5 TiB 1.5 TiB 12 KiB 3.1 GiB 333 GiB 82.14 1.19 38 up osd.13
-3 10.91574 - 11 TiB 7.5 TiB 7.5 TiB 224 KiB 15 GiB 3.4 TiB 68.56 0.99 - host node2pve6
0 ssd 1.81929 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 28 KiB 2.6 GiB 567 GiB 69.56 1.00 32 up osd.0
1 ssd 1.81929 1.00000 1.8 TiB 1.4 TiB 1.4 TiB 40 KiB 2.8 GiB 453 GiB 75.68 1.09 35 up osd.1
2 ssd 1.81929 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 44 KiB 2.7 GiB 532 GiB 71.47 1.03 33 up osd.2
3 ssd 1.81929 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 20 KiB 2.2 GiB 774 GiB 58.43 0.84 27 up osd.3
4 ssd 1.81929 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 48 KiB 2.4 GiB 692 GiB 62.84 0.91 29 up osd.4
5 ssd 1.81929 1.00000 1.8 TiB 1.3 TiB 1.3 TiB 44 KiB 2.7 GiB 496 GiB 73.38 1.06 34 up osd.5
-5 10.91574 - 11 TiB 7.6 TiB 7.6 TiB 216 KiB 16 GiB 3.3 TiB 69.70 1.01 - host node3pve6
6 ssd 1.81929 1.00000 1.8 TiB 1.5 TiB 1.5 TiB 12 KiB 3.6 GiB 325 GiB 82.53 1.19 38 up osd.6
7 ssd 1.81929 0.95001 1.8 TiB 1.5 TiB 1.5 TiB 60 KiB 2.9 GiB 331 GiB 82.21 1.19 38 up osd.7
14 ssd 1.81929 1.00000 1.8 TiB 1.0 TiB 1.0 TiB 24 KiB 2.1 GiB 814 GiB 56.32 0.81 26 up osd.14
15 ssd 1.81929 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 40 KiB 2.3 GiB 772 GiB 58.55 0.85 27 up osd.15
16 ssd 1.81929 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 36 KiB 2.4 GiB 736 GiB 60.47 0.87 28 up osd.16
17 ssd 1.81929 0.90002 1.8 TiB 1.4 TiB 1.4 TiB 44 KiB 2.8 GiB 408 GiB 78.10 1.13 36 up osd.17
-9 10.91606 - 11 TiB 7.3 TiB 7.3 TiB 3.4 MiB 17 GiB 3.6 TiB 66.75 0.96 - host node6pve6
18 ssd 3.63869 1.00000 3.6 TiB 2.4 TiB 2.4 TiB 1.9 MiB 5.3 GiB 1.2 TiB 66.08 0.95 61 up osd.18
19 ssd 3.63869 1.00000 3.6 TiB 2.6 TiB 2.6 TiB 16 KiB 6.0 GiB 1.0 TiB 72.50 1.05 67 up osd.19
20 ssd 3.63869 1.00000 3.6 TiB 2.2 TiB 2.2 TiB 1.4 MiB 5.5 GiB 1.4 TiB 61.68 0.89 57 up osd.20
21 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.21
22 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.22
23 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.23
TOTAL 44 TiB 30 TiB 30 TiB 3.9 MiB 66 GiB 13 TiB 69.29
MIN/MAX VAR: 0.78/1.19 STDDEV: 8.97
root@node1pve6:~#


root@node1pve6:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.8-pve1
ceph-fuse: 14.2.8-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-22
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
root@node1pve6:~#
 
tderr: 2020-09-12 16:03:16.675 7ff0bb288700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2020-09-12 16:03:16.675 7ff0bb288700 -1 AuthRegistry(0x7ff0b40817b8) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: got monmap epoch 3
Please post the complete output. And use CODE tages </>, they preserve formating.

pools: 1 pools, 256 pgs
You do not have enough PGs, for 21x OSDs it should already be 1024.

We have 7 nodes cluster. The node7pve running with pve 6.2.10 , the other nodes installed with pve 6.1.8
Please update all nodes to the latest versions.
 
Please post the complete output. And use CODE tages </>, they preserve formating.
<code>
create OSD on /dev/nvme2n1 (bluestore)
wipe disk/partition: /dev/nvme2n1
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.321037 s, 653 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new d5655e0d-1e7d-4cb5-ae65-b8340b040920
Running command: /sbin/vgcreate --force --yes ceph-c9bab4b9-d96b-4578-9bde-21690a1d5f1b /dev/nvme2n1
stdout: Physical volume "/dev/nvme2n1" successfully created.
stdout: Volume group "ceph-c9bab4b9-d96b-4578-9bde-21690a1d5f1b" successfully created
Running command: /sbin/lvcreate --yes -l 100%FREE -n osd-block-d5655e0d-1e7d-4cb5-ae65-b8340b040920 ceph-c9bab4b9-d96b-4578-9bde-21690a1d5f1b
stdout: Logical volume "osd-block-d5655e0d-1e7d-4cb5-ae65-b8340b040920" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-21
--> Executable selinuxenabled not in PATH: /sbin:/bin:/usr/sbin:/usr/bin
Running command: /bin/chown -h ceph:ceph /dev/ceph-c9bab4b9-d96b-4578-9bde-21690a1d5f1b/osd-block-d5655e0d-1e7d-4cb5-ae65-b8340b040920
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/ceph-c9bab4b9-d96b-4578-9bde-21690a1d5f1b/osd-block-d5655e0d-1e7d-4cb5-ae65-b8340b040920 /var/lib/ceph/osd/ceph-21/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-21/activate.monmap
stderr: 2020-09-12 15:23:23.835 7faee8cfa700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2020-09-12 15:23:23.835 7faee8cfa700 -1 AuthRegistry(0x7faee40817b8) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: got monmap epoch 3
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-21/keyring --create-keyring --name osd.21 --add-key AQDJy1xfWyz3GBAAlvFO2vBi52iVocSEDkxKxA==
stdout: creating /var/lib/ceph/osd/ceph-21/keyring
stdout: added entity osd.21 auth(key=AQDJy1xfWyz3GBAAlvFO2vBi52iVocSEDkxKxA==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-21/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-21/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 21 --monmap /var/lib/ceph/osd/ceph-21/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-21/ --osd-uuid d5655e0d-1e7d-4cb5-ae65-b8340b040920 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/nvme2n1
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 30a0178f-b63f-4526-914d-e0d7067de313 --data /dev/nvme2n1' failed: received interrupt
</code>
 
Best use the button </> in the editor.

TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 30a0178f-b63f-4526-914d-e0d7067de313 --data /dev/nvme2n1' failed: received interrupt
The activation of the OSD seemed to fail.

21 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.21
22 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.22
23 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.23
Best cleanup those leftovers first.

Which operation should be performed first? update or PGs increase?
Either way. The increase of the PGs will take some time, since it also redistributes data. The usage levels of the OSDs should get more even as well.
 
Dear Alwin!

Thanks the help. We added the disk succesfully, after upgrading OS.

The process was successful, now the question is, how can i remove the unsuccessfully added disks?
The outdated disk are not visible under the OSD menu.

proxdisk.PNG
 
pveceph osd destroy <id> should be enough. These IDs should not be in use by any other OSD.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!