Hi,
I am trying to create an OSD on one of our nodes in our 4 node cluster and I am getting this error:
System state before trying to create the OSD (via the webUI; /dev/sdi is the disk for the new OSD and /dev/nvme0n1 is where the block.db for this OSD should be placed):
Here is the full error message from the webUI (I selected /dev/sdi and /dev/nvme0n1 for block.db and chose 3GB, since I now learned that our initial size of 20GB isn't well suited for rocksdb, but I got the same error when using our default size of 20GB):
The line |using '' for block.db| looks suspicious, like there is a null/empty value.
System state after the failed OSD creation attempt:
So there is a new partition (#14) created and there is space for it (even enough for a 20GB partition).
I updated and rebooted this system yesterday, so it should be up to date.
Any ideas what the reason could be?
I am trying to create an OSD on one of our nodes in our 4 node cluster and I am getting this error:
Code:
command 'ceph-volume lvm create --cluster-fsid e9f42f14-bed0-4839-894b-0ca3e598320e --block.db '' --data /dev/sdi' failed: exit code 1
System state before trying to create the OSD (via the webUI; /dev/sdi is the disk for the new OSD and /dev/nvme0n1 is where the block.db for this OSD should be placed):
Code:
root@pve4:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
......
sdh 8:112 1 7.3T 0 disk
├─sdh1 8:113 1 100M 0 part /var/lib/ceph/osd/ceph-44
└─sdh2 8:114 1 7.3T 0 part
sdi 8:128 1 7.3T 0 disk
sdj 8:144 1 7.3T 0 disk
├─sdj1 8:145 1 100M 0 part /var/lib/ceph/osd/ceph-25
└─sdj2 8:146 1 7.3T 0 part
....
nvme0n1 259:0 0 260.9G 0 disk
├─nvme0n1p1 259:1 0 20G 0 part
├─nvme0n1p3 259:2 0 20G 0 part
├─nvme0n1p4 259:3 0 20G 0 part
├─nvme0n1p6 259:4 0 20G 0 part
├─nvme0n1p7 259:5 0 20G 0 part
├─nvme0n1p8 259:6 0 20G 0 part
├─nvme0n1p9 259:7 0 20G 0 part
├─nvme0n1p10 259:8 0 20G 0 part
├─nvme0n1p11 259:9 0 20G 0 part
├─nvme0n1p12 259:10 0 20G 0 part
└─nvme0n1p13 259:11 0 20G 0 part
...
root@pve4:~# gdisk -l /dev/nvme0n1
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/nvme0n1: 547002288 sectors, 260.8 GiB
Model: INTEL SSDPED1D280GA
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 19AFB808-D8FA-4819-B95C-DBF93CD6AECF
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 547002254
Partitions will be aligned on 2048-sector boundaries
Total free space is 79337325 sectors (37.8 GiB)
Number Start (sector) End (sector) Size Code Name
1 2048 41945087 20.0 GiB FFFF ceph block.db
3 83888128 125831167 20.0 GiB FFFF ceph block.db
4 125831168 167774207 20.0 GiB FFFF ceph block.db
6 209717248 251660287 20.0 GiB FFFF ceph block.db
7 251660288 293603327 20.0 GiB FFFF ceph block.db
8 293603328 335546367 20.0 GiB FFFF ceph block.db
9 335546368 377489407 20.0 GiB FFFF ceph block.db
10 377489408 419432447 20.0 GiB FFFF ceph block.db
11 419432448 461375487 20.0 GiB FFFF ceph block.db
12 461375488 503318527 20.0 GiB FFFF ceph block.db
13 503318528 545261567 20.0 GiB FFFF ceph block.db
......
Here is the full error message from the webUI (I selected /dev/sdi and /dev/nvme0n1 for block.db and chose 3GB, since I now learned that our initial size of 20GB isn't well suited for rocksdb, but I got the same error when using our default size of 20GB):
Code:
create OSD on /dev/sdi (bluestore)
creating block.db on '/dev/nvme0n1'
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
Use of uninitialized value $part_or_lv in concatenation (.) or string at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 465.
using '' for block.db
wipe disk/partition: /dev/sdi
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.954768 s, 220 MB/s
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 761d95ef-7526-4c85-b33d-759bea2da16e
Running command: /sbin/vgcreate --force --yes ceph-3823a6f7-bb18-4f78-be2e-30689655458a /dev/sdi
stdout: Physical volume "/dev/sdi" successfully created.
stdout: Volume group "ceph-3823a6f7-bb18-4f78-be2e-30689655458a" successfully created
Running command: /sbin/lvcreate --yes -l 1907721 -n osd-block-761d95ef-7526-4c85-b33d-759bea2da16e ceph-3823a6f7-bb18-4f78-be2e-30689655458a
stdout: Logical volume "osd-block-761d95ef-7526-4c85-b33d-759bea2da16e" created.
--> blkid could not detect a PARTUUID for device:
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.24 --yes-i-really-mean-it
stderr: purged osd.24
--> RuntimeError: unable to use device
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid e9f42f14-bed0-4839-894b-0ca3e598320e --block.db '' --data /dev/sdi' failed: exit code 1
The line |using '' for block.db| looks suspicious, like there is a null/empty value.
System state after the failed OSD creation attempt:
Code:
root@pve4:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
....
sdh 8:112 1 7.3T 0 disk
├─sdh1 8:113 1 100M 0 part /var/lib/ceph/osd/ceph-44
└─sdh2 8:114 1 7.3T 0 part
sdi 8:128 1 7.3T 0 disk
└─ceph--3823a6f7--bb18--4f78--be2e--30689655458a-osd--block--761d95ef--7526--4c85--b33d--759bea2da16e 253:6 0 7.3T 0 lvm
sdj 8:144 1 7.3T 0 disk
├─sdj1 8:145 1 100M 0 part /var/lib/ceph/osd/ceph-25
└─sdj2 8:146 1 7.3T 0 part
...
nvme0n1 259:0 0 260.9G 0 disk
├─nvme0n1p1 259:1 0 20G 0 part
├─nvme0n1p3 259:2 0 20G 0 part
├─nvme0n1p4 259:3 0 20G 0 part
├─nvme0n1p6 259:4 0 20G 0 part
├─nvme0n1p7 259:5 0 20G 0 part
├─nvme0n1p8 259:6 0 20G 0 part
├─nvme0n1p9 259:7 0 20G 0 part
├─nvme0n1p10 259:8 0 20G 0 part
├─nvme0n1p11 259:9 0 20G 0 part
├─nvme0n1p12 259:10 0 20G 0 part
└─nvme0n1p13 259:11 0 20G 0 part
...
root@pve4:~# gdisk -l /dev/nvme0n1
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/nvme0n1: 547002288 sectors, 260.8 GiB
Model: INTEL SSDPED1D280GA
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 19AFB808-D8FA-4819-B95C-DBF93CD6AECF
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 547002254
Partitions will be aligned on 2048-sector boundaries
Total free space is 79337325 sectors (37.8 GiB)
Number Start (sector) End (sector) Size Code Name
1 2048 41945087 20.0 GiB FFFF ceph block.db
3 83888128 125831167 20.0 GiB FFFF ceph block.db
4 125831168 167774207 20.0 GiB FFFF ceph block.db
6 209717248 251660287 20.0 GiB FFFF ceph block.db
7 251660288 293603327 20.0 GiB FFFF ceph block.db
8 293603328 335546367 20.0 GiB FFFF ceph block.db
9 335546368 377489407 20.0 GiB FFFF ceph block.db
10 377489408 419432447 20.0 GiB FFFF ceph block.db
11 419432448 461375487 20.0 GiB FFFF ceph block.db
12 461375488 503318527 20.0 GiB FFFF ceph block.db
13 503318528 545261567 20.0 GiB FFFF ceph block.db
14 41945088 48236543 3.0 GiB 8300
So there is a new partition (#14) created and there is space for it (even enough for a 20GB partition).
I updated and rebooted this system yesterday, so it should be up to date.
Code:
root@pve4:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 14.2.19-pve1
ceph-fuse: 14.2.19-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.13-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
Any ideas what the reason could be?