[SOLVED] Failure while adding new OSDs

Sebastian Schubert · Oct 4, 2019

Hi there,

running on proxmox 6 i have a problem adding another osd (bought a few new disks) to the system.
system has been installed on version 5.x and upgraded to latest.

adding a fresh disk (/dev/sdh with journal on /dev/sdb) to the system ends up with the following error:

Code:

create OSD on /dev/sdh (bluestore)
creating block.db on '/dev/sdb'
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
Use of uninitialized value $part_or_lv in concatenation (.) or string at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 439.
using '' for block.db
wipe disk/partition: /dev/sdh
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.10591 s, 190 MB/s
-->  RuntimeError: unable to use device
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8320d1c2-aa50-48b6-99a4-f390f1df07c1
Running command: /sbin/vgcreate -s 1G --force --yes ceph-613a905c-cc64-41ce-bf0d-9d173fc3af8d /dev/sdh
 stdout: Physical volume "/dev/sdh" successfully created.
 stdout: Volume group "ceph-613a905c-cc64-41ce-bf0d-9d173fc3af8d" successfully created
Running command: /sbin/lvcreate --yes -l 100%FREE -n osd-block-8320d1c2-aa50-48b6-99a4-f390f1df07c1 ceph-613a905c-cc64-41ce-bf0d-9d173fc3af8d
 stdout: Logical volume "osd-block-8320d1c2-aa50-48b6-99a4-f390f1df07c1" created.
--> blkid could not detect a PARTUUID for device:
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.16 --yes-i-really-mean-it
 stderr: 2019-10-02 13:18:48.767 7fe437115700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2019-10-02 13:18:48.767 7fe437115700 -1 AuthRegistry(0x7fe43007f818) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: purged osd.16
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid a362929e-63ab-4361-97ca-f152656dcab1 --block.db '' --data /dev/sdh' failed: exit code 1

after that, lsblk shows uplike this

Code:

root@lxc-prox1:~# lsblk
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                     8:0    0 278.5G  0 disk
├─sda1                                                                                                  8:1    0  1007K  0 part
├─sda2                                                                                                  8:2    0   512M  0 part
└─sda3                                                                                                  8:3    0   278G  0 part
  ├─pve-root                                                                                          253:0    0  69.3G  0 lvm  /
  ├─pve-swap                                                                                          253:2    0     8G  0 lvm  [SWAP]
  ├─pve-data_tmeta                                                                                    253:3    0   1.9G  0 lvm
  │ └─pve-data-tpool                                                                                  253:5    0   181G  0 lvm
  │   ├─pve-data                                                                                      253:6    0   181G  0 lvm
  │   └─pve-vm--100--disk--0                                                                          253:7    0     8G  0 lvm
  └─pve-data_tdata                                                                                    253:4    0   181G  0 lvm
    └─pve-data-tpool                                                                                  253:5    0   181G  0 lvm
      ├─pve-data                                                                                      253:6    0   181G  0 lvm
      └─pve-vm--100--disk--0                                                                          253:7    0     8G  0 lvm
sdb                                                                                                     8:16   0 745.2G  0 disk
├─sdb1                                                                                                  8:17   0     1G  0 part
├─sdb2                                                                                                  8:18   0     1G  0 part
└─sdb3                                                                                                  8:19   0 111.8G  0 part
sdc                                                                                                     8:32   0 745.2G  0 disk
├─sdc1                                                                                                  8:33   0     1G  0 part
└─sdc2                                                                                                  8:34   0     1G  0 part
sdd                                                                                                     8:48   0   1.1T  0 disk
├─sdd1                                                                                                  8:49   0   100M  0 part /var/lib/ceph/osd/ceph-0
└─sdd2                                                                                                  8:50   0   1.1T  0 part
sde                                                                                                     8:64   0   1.1T  0 disk
├─sde1                                                                                                  8:65   0   100M  0 part /var/lib/ceph/osd/ceph-1
└─sde2                                                                                                  8:66   0   1.1T  0 part
sdf                                                                                                     8:80   0   1.1T  0 disk
├─sdf1                                                                                                  8:81   0   100M  0 part /var/lib/ceph/osd/ceph-2
└─sdf2                                                                                                  8:82   0   1.1T  0 part
sdg                                                                                                     8:96   0   1.1T  0 disk
├─sdg1                                                                                                  8:97   0   100M  0 part /var/lib/ceph/osd/ceph-3
└─sdg2                                                                                                  8:98   0   1.1T  0 part
sdh                                                                                                     8:112  0   1.1T  0 disk
└─ceph--613a905c--cc64--41ce--bf0d--9d173fc3af8d-osd--block--8320d1c2--aa50--48b6--99a4--f390f1df07c1 253:1    0   1.1T  0 lvm
sdi                                                                                                     8:128  0   1.1T  0 disk
sdj                                                                                                     8:144  0   1.1T  0 disk
sdk                                                                                                     8:160  0   1.1T  0 disk
sr0                                                                                                    11:0    1  1024M  0 rom
rbd0                                                                                                  252:0    0     8G  0 disk
rbd1                                                                                                  252:16   0    20G  0 disk

which is interesting, because the old osd disks have been parted to a 100mb and another 1.1 tb instead of one big lvm PV (with vg+lv beneath it) and the journal on sdb was 1G (the new one is 112gb)

all actions where done via the web-gui, only action taken on cli was stop each osd, run "ceph-bluestore-tool repair" on the disks to clear the "legacy bluestore blah" warning after the upgrade to PVE 6

any suggestions what i did wrong/how to do it properly?

Alwin · Oct 4, 2019

Sebastian Schubert said:
which is interesting, because the old osd disks have been parted to a 100mb and another 1.1 tb instead of one big lvm PV (with vg+lv beneath it) and the journal on sdb was 1G (the new one is 112gb)

The existing OSDs where created with ceph-disk that doesn't exist anymore in Nautilus. The new tool is called ceph-volume. The partition size is passed to ceph-volume and needs to either fit 3, 30, 300 GB before it spills over to the data disk of the OSD. This is due to RocksDBs way of storing the DB.

On what pveversion -v are you? It seems to me this has been fixed already.

Sebastian Schubert · Oct 4, 2019

hi alwin

this is the output - since it's our Proof of Concept / Testbed, it's on the pve-no-subscription repo (instead of our prod systems)

Code:

root@lxc-prox1-poc:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-8
pve-kernel-helper: 6.0-8
pve-kernel-4.15: 5.4-6
pve-kernel-5.0.21-2-pve: 5.0.21-6
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.12-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2

Sebastian Schubert · Oct 4, 2019

anyway - what would be the right way to add some osd's now - drop all osds and re-create them with the new tool?

Alwin · Oct 4, 2019

Sebastian Schubert said:
Use of uninitialized value $part_or_lv in concatenation (.) or string at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 439. using '' for block.db

It should work as you would have expected. A fix was included in pve-manager 6.0-7.

Sebastian Schubert said:
anyway - what would be the right way to add some osd's now - drop all osds and re-create them with the new tool?

As you have done already. If you create a new OSD on CLI, try pveceph osd create /dev/sdX --db_dev /dev/sdY --db_size 32G . If you don't specify the db_size it will be 10% of the data disk size.

Sebastian Schubert · Oct 4, 2019

Hi Alwin,

Alwin said:
It should work as you would have expected. A fix was included in pve-manager 6.0-7.

sorry, this seems not the case :-/ the function still returns an empty value

Code:

Use of uninitialized value $part_or_lv in concatenation (.) or string at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 439.
using '' for block.db

Alwin said:
As you have done already. If you create a new OSD on CLI, try pveceph osd create /dev/sdX --db_dev /dev/sdY --db_size 32G . If you don't specify the db_size it will be 10% of the data disk size.

same error here ..

Alwin · Oct 7, 2019

Please restart the pvedaemon.service and try again, maybe the new code wasn't loaded yet.

Sebastian Schubert · Oct 7, 2019

Hi Alwin,

the Nodes have been rebooted before, and even after manually restarting pvedaemon.service the problem still occurs.

Alwin · Oct 8, 2019

As 'pveceph' is a wrapper around ceph-volume, you can try something like this: ceph-volume lvm create --data {vg name/lv name} --journal /path/to/device to create the OSD. To see the options run ceph-volume lvm create -h.
https://docs.ceph.com/docs/nautilus/man/8/ceph-volume/

Sebastian Schubert · Oct 9, 2019

I've thrown away the old osd's and created them from scratch - this time it worked ... leaving a bitter taste for future updates

anyway thanks for your support alwin

Search

Search

[SOLVED] Failure while adding new OSDs

Sebastian Schubert

Well-Known Member

Alwin

Proxmox Retired Staff

Sebastian Schubert

Well-Known Member

Sebastian Schubert

Well-Known Member

Alwin

Proxmox Retired Staff

Sebastian Schubert

Well-Known Member

Alwin

Proxmox Retired Staff

Sebastian Schubert

Well-Known Member

Alwin

Proxmox Retired Staff

Sebastian Schubert

Well-Known Member

We value your privacy