Ceph osd create failure - device in use error

cloudguy

Renowned Member
Jan 4, 2012
44
0
71
Hello

I've created a net-new PVE 7.1 cluster (3 nodes). While configuring Ceph 16.2 and adding OSDs, I got a seg-fault error:

Code:
create OSD on /dev/sdd (bluestore)
wiping block device /dev/sdd
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.30635 s, 161 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new ef3e9862-3322-4a69-84bf-f63beb18eb3d
Running command: /sbin/vgcreate --force --yes ceph-e264d0e2-bbca-4f3a-8917-03f574fc8f88 /dev/sdd
 stdout: Physical volume "/dev/sdd" successfully created.
 stdout: Volume group "ceph-e264d0e2-bbca-4f3a-8917-03f574fc8f88" successfully created
Running command: /sbin/lvcreate --yes -l 286160 -n osd-block-ef3e9862-3322-4a69-84bf-f63beb18eb3d ceph-e264d0e2-bbca-4f3a-8917-03f574fc8f88
 stdout: Logical volume "osd-block-ef3e9862-3322-4a69-84bf-f63beb18eb3d" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /sbin/cryptsetup --batch-mode --key-file - luksFormat /dev/ceph-e264d0e2-bbca-4f3a-8917-03f574fc8f88/osd-block-ef3e9862-3322-4a69-84bf-f63beb18eb3d
Running command: /sbin/cryptsetup --key-file - --allow-discards luksOpen /dev/ceph-e264d0e2-bbca-4f3a-8917-03f574fc8f88/osd-block-ef3e9862-3322-4a69-84bf-f63beb18eb3d olrKyp-7EEP-1y3i-7x9x-19Tw-9Chn-49oUhr
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-12
--> Executable selinuxenabled not in PATH: /sbin:/bin:/usr/sbin:/usr/bin
Running command: /bin/chown -h ceph:ceph /dev/mapper/olrKyp-7EEP-1y3i-7x9x-19Tw-9Chn-49oUhr
Running command: /bin/chown -R ceph:ceph /dev/dm-3
Running command: /bin/ln -s /dev/mapper/olrKyp-7EEP-1y3i-7x9x-19Tw-9Chn-49oUhr /var/lib/ceph/osd/ceph-12/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-12/activate.monmap
 stderr: 2022-03-10T10:35:40.107-0500 7fa544b2a700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-03-10T10:35:40.107-0500 7fa544b2a700 -1 AuthRegistry(0x7fa54005b2e8) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: got monmap epoch 3
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-12/keyring --create-keyring --name osd.12 --add-key AQDCGipitcAaARAAHdsUNxg1tUaMyR66LQl5Qg==
 stdout: creating /var/lib/ceph/osd/ceph-12/keyring
added entity osd.12 auth(key=AQDCGipitcAaARAAHdsUNxg1tUaMyR66LQl5Qg==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 12 --monmap /var/lib/ceph/osd/ceph-12/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-12/ --osd-uuid ef3e9862-3322-4a69-84bf-f63beb18eb3d --setuser ceph --setgroup ceph
 stderr: 2022-03-10T10:35:40.547-0500 7f09800ebf00 -1 bluestore(/var/lib/ceph/osd/ceph-12/) _read_fsid unparsable uuid
 stderr: 2022-03-10T10:35:40.571-0500 7f09800ebf00 -1 bluefs _replay 0x0: stop: uuid 70144648-6562-4cc3-c13e-0f1e022a4795 != super.uuid 1320e201-073f-4b55-88a2-036f81185f14, block dump:
.... (see attachment) ....
 stderr: 2022-03-10T10:35:41.355-0500 7f09800ebf00 -1 rocksdb: verify_sharding unable to list column families: NotFound:
 stderr: 2022-03-10T10:35:41.355-0500 7f09800ebf00 -1 bluestore(/var/lib/ceph/osd/ceph-12/) _open_db erroring opening db:
 stderr: 2022-03-10T10:35:41.891-0500 7f09800ebf00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
 stderr: 2022-03-10T10:35:41.891-0500 7f09800ebf00 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-12/: (5) Input/output error[0m
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.12 --yes-i-really-mean-it
 stderr: 2022-03-10T10:35:42.035-0500 7fe8fd907700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-03-10T10:35:42.035-0500 7fe8fd907700 -1 AuthRegistry(0x7fe8f805b2e8) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: purged osd.12
-->  RuntimeError: Command failed with exit code 250: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 12 --monmap /var/lib/ceph/osd/ceph-12/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-12/ --osd-uuid ef3e9862-3322-4a69-84bf-f63beb18eb3d --setuser ceph --setgroup ceph
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 9acff7fb-eae6-4b9d-a89f-119138f3b798 --data /dev/sdd --dmcrypt' failed: exit code 1

The OSD does not show up in the list, and appears unused. I noticed that customary VG/LV were created for the block device in question, so I removed them via vgremove / pvremove and a wipefs -a for good measure.

The cleaned device does not enumerate in the PVE web console when attempting to re-add, and I'm unable to add the device as per the pveceph CLI command:

Code:
# pveceph osd create /dev/sdd
device '/dev/sdd' is already in use

Any idea where /dev/sdd could still be referenced? Its not showing up in a
Code:
ceph devices ls
output

My cluster:
Code:
# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-5-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-8
pve-kernel-5.13.19-5-pve: 5.13.19-13
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
 

Attachments

  • PVE71-OSD-ERROR.txt
    26.7 KB · Views: 0
TL;DR: Question: Is there any broader / cluster-wide implications of an action failing via pveceph / web console commands vs. it successfully completing via ceph native CLI tools?

I suspect there could be an issue with pveceph as it pertains to creating OSDs and some processes not completing unsuccessfully.

A bit more info..

When I attempt to add a volume (as described in prior thread) via pveceph its unsuccessful:

Code:
# pveceph osd create /dev/sdd
device '/dev/sdd' is already in use

However Ceph-volume inventory shows device is available:

Code:
# ceph-volume inventory --format json-pretty

[
    {
        "available": true,
        "device_id": "HGST_HUC101212CSS600_L0GRVVJG",
        "lsm_data": {},
        "lvs": [],
        "path": "/dev/sdd",
        "rejected_reasons": [],
        "sys_api": {
            "human_readable_size": "1.09 TB",
            "locked": 0,
            "model": "HUC101212CSS600",
            "nr_requests": "256",
            "partitions": {},
            "path": "/dev/sdd",
            "removable": "0",
            "rev": "A469",
            "ro": "0",
            "rotational": "1",
            "sas_address": "0x5000cca0722993e9",
            "sas_device_handle": "0x000c",
            "scheduler_mode": "mq-deadline",
            "sectors": 0,
            "sectorsize": "512",
            "size": 1200243695616.0,
            "support_discard": "0",
            "vendor": "HGST"
        }
    },

I attempted to run the Ceph-volume command manually to create the volume and it successfully completed.

Code:
ceph-volume lvm create --cluster-fsid {redacted}--data /dev/sdd --dmcrypt

--> ceph-volume lvm activate successful for osd ID: 2
--> ceph-volume lvm create successful for: /dev/sdd
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!