[SOLVED] Failure while adding new OSDs

Aug 28, 2017
67
12
48
45
Hi there,

running on proxmox 6 i have a problem adding another osd (bought a few new disks) to the system.
system has been installed on version 5.x and upgraded to latest.

adding a fresh disk (/dev/sdh with journal on /dev/sdb) to the system ends up with the following error:

Code:
create OSD on /dev/sdh (bluestore)
creating block.db on '/dev/sdb'
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
Use of uninitialized value $part_or_lv in concatenation (.) or string at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 439.
using '' for block.db
wipe disk/partition: /dev/sdh
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.10591 s, 190 MB/s
-->  RuntimeError: unable to use device
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8320d1c2-aa50-48b6-99a4-f390f1df07c1
Running command: /sbin/vgcreate -s 1G --force --yes ceph-613a905c-cc64-41ce-bf0d-9d173fc3af8d /dev/sdh
 stdout: Physical volume "/dev/sdh" successfully created.
 stdout: Volume group "ceph-613a905c-cc64-41ce-bf0d-9d173fc3af8d" successfully created
Running command: /sbin/lvcreate --yes -l 100%FREE -n osd-block-8320d1c2-aa50-48b6-99a4-f390f1df07c1 ceph-613a905c-cc64-41ce-bf0d-9d173fc3af8d
 stdout: Logical volume "osd-block-8320d1c2-aa50-48b6-99a4-f390f1df07c1" created.
--> blkid could not detect a PARTUUID for device:
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.16 --yes-i-really-mean-it
 stderr: 2019-10-02 13:18:48.767 7fe437115700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2019-10-02 13:18:48.767 7fe437115700 -1 AuthRegistry(0x7fe43007f818) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: purged osd.16
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid a362929e-63ab-4361-97ca-f152656dcab1 --block.db '' --data /dev/sdh' failed: exit code 1

after that, lsblk shows uplike this

Code:
root@lxc-prox1:~# lsblk
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                     8:0    0 278.5G  0 disk
├─sda1                                                                                                  8:1    0  1007K  0 part
├─sda2                                                                                                  8:2    0   512M  0 part
└─sda3                                                                                                  8:3    0   278G  0 part
  ├─pve-root                                                                                          253:0    0  69.3G  0 lvm  /
  ├─pve-swap                                                                                          253:2    0     8G  0 lvm  [SWAP]
  ├─pve-data_tmeta                                                                                    253:3    0   1.9G  0 lvm
  │ └─pve-data-tpool                                                                                  253:5    0   181G  0 lvm
  │   ├─pve-data                                                                                      253:6    0   181G  0 lvm
  │   └─pve-vm--100--disk--0                                                                          253:7    0     8G  0 lvm
  └─pve-data_tdata                                                                                    253:4    0   181G  0 lvm
    └─pve-data-tpool                                                                                  253:5    0   181G  0 lvm
      ├─pve-data                                                                                      253:6    0   181G  0 lvm
      └─pve-vm--100--disk--0                                                                          253:7    0     8G  0 lvm
sdb                                                                                                     8:16   0 745.2G  0 disk
├─sdb1                                                                                                  8:17   0     1G  0 part
├─sdb2                                                                                                  8:18   0     1G  0 part
└─sdb3                                                                                                  8:19   0 111.8G  0 part
sdc                                                                                                     8:32   0 745.2G  0 disk
├─sdc1                                                                                                  8:33   0     1G  0 part
└─sdc2                                                                                                  8:34   0     1G  0 part
sdd                                                                                                     8:48   0   1.1T  0 disk
├─sdd1                                                                                                  8:49   0   100M  0 part /var/lib/ceph/osd/ceph-0
└─sdd2                                                                                                  8:50   0   1.1T  0 part
sde                                                                                                     8:64   0   1.1T  0 disk
├─sde1                                                                                                  8:65   0   100M  0 part /var/lib/ceph/osd/ceph-1
└─sde2                                                                                                  8:66   0   1.1T  0 part
sdf                                                                                                     8:80   0   1.1T  0 disk
├─sdf1                                                                                                  8:81   0   100M  0 part /var/lib/ceph/osd/ceph-2
└─sdf2                                                                                                  8:82   0   1.1T  0 part
sdg                                                                                                     8:96   0   1.1T  0 disk
├─sdg1                                                                                                  8:97   0   100M  0 part /var/lib/ceph/osd/ceph-3
└─sdg2                                                                                                  8:98   0   1.1T  0 part
sdh                                                                                                     8:112  0   1.1T  0 disk
└─ceph--613a905c--cc64--41ce--bf0d--9d173fc3af8d-osd--block--8320d1c2--aa50--48b6--99a4--f390f1df07c1 253:1    0   1.1T  0 lvm
sdi                                                                                                     8:128  0   1.1T  0 disk
sdj                                                                                                     8:144  0   1.1T  0 disk
sdk                                                                                                     8:160  0   1.1T  0 disk
sr0                                                                                                    11:0    1  1024M  0 rom
rbd0                                                                                                  252:0    0     8G  0 disk
rbd1                                                                                                  252:16   0    20G  0 disk

which is interesting, because the old osd disks have been parted to a 100mb and another 1.1 tb instead of one big lvm PV (with vg+lv beneath it) and the journal on sdb was 1G (the new one is 112gb)

all actions where done via the web-gui, only action taken on cli was stop each osd, run "ceph-bluestore-tool repair" on the disks to clear the "legacy bluestore blah" warning after the upgrade to PVE 6

any suggestions what i did wrong/how to do it properly?
 
which is interesting, because the old osd disks have been parted to a 100mb and another 1.1 tb instead of one big lvm PV (with vg+lv beneath it) and the journal on sdb was 1G (the new one is 112gb)
The existing OSDs where created with ceph-disk that doesn't exist anymore in Nautilus. The new tool is called ceph-volume. The partition size is passed to ceph-volume and needs to either fit 3, 30, 300 GB before it spills over to the data disk of the OSD. This is due to RocksDBs way of storing the DB.

On what pveversion -v are you? It seems to me this has been fixed already.
 
hi alwin

this is the output - since it's our Proof of Concept / Testbed, it's on the pve-no-subscription repo (instead of our prod systems)

Code:
root@lxc-prox1-poc:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-8
pve-kernel-helper: 6.0-8
pve-kernel-4.15: 5.4-6
pve-kernel-5.0.21-2-pve: 5.0.21-6
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.12-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2
 
Use of uninitialized value $part_or_lv in concatenation (.) or string at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 439. using '' for block.db
It should work as you would have expected. A fix was included in pve-manager 6.0-7.

anyway - what would be the right way to add some osd's now - drop all osds and re-create them with the new tool?
As you have done already. If you create a new OSD on CLI, try pveceph osd create /dev/sdX --db_dev /dev/sdY --db_size 32G . If you don't specify the db_size it will be 10% of the data disk size.
 
Hi Alwin,
It should work as you would have expected. A fix was included in pve-manager 6.0-7.
sorry, this seems not the case :-/ the function still returns an empty value
Code:
Use of uninitialized value $part_or_lv in concatenation (.) or string at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 439.
using '' for block.db
As you have done already. If you create a new OSD on CLI, try pveceph osd create /dev/sdX --db_dev /dev/sdY --db_size 32G . If you don't specify the db_size it will be 10% of the data disk size.
same error here ..
 
Please restart the pvedaemon.service and try again, maybe the new code wasn't loaded yet.
 
As 'pveceph' is a wrapper around ceph-volume, you can try something like this: ceph-volume lvm create --data {vg name/lv name} --journal /path/to/device to create the OSD. To see the options run ceph-volume lvm create -h.
https://docs.ceph.com/docs/nautilus/man/8/ceph-volume/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!