pveceph osd create /dev/sda fails

lucentwolf

Member
Dec 19, 2019
29
8
23
Hi all

Prologue:
  • I had a unresponsive node (let's call it #6) which I could ping; the node's osd was up and in; however I could not ssh into it (err: "broken pipe" directly after entering the password).
  • So i turned it off; then on. It booted, however it's osd did not start
  • Next I updated all nodes, however #6 complained that the ceph.list was missing final new line.
  • cat on .../ceph.list showed it contained unreadable garbage; so I deleted it and copied the contents of ceph.list from another node onto #6
  • apt dist-upgrade on #6 then ran without error messages; however after reboot the osd was still down and out.
  • ...destroyed the osd on #6 -> no errors
  • created new osd on #6 -> fail with exit code 1
So I started a shell to see the output of pveceph osd create /dev/sda:.

I spotted the following line in the command's output:
stderr: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 547: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!


Total output of pvecpeh osd create /dev/sda
Code:
root@n06:~# pveceph osd create /dev/sda
create OSD on /dev/sda (bluestore)
wiping block device /dev/sda
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.934404 s, 224 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 1e35b7d8-afce-4e59-8ff6-0063a899b28d
Running command: vgcreate --force --yes ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714 /dev/sda
 stdout: Physical volume "/dev/sda" successfully created.
 stdout: Volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" successfully created
Running command: lvcreate --yes -l 3814911 -n osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
 stdout: Logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-5
--> Executable selinuxenabled not in PATH: /sbin:/bin:/usr/sbin:/usr/bin
Running command: /bin/chown -h ceph:ceph /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d
Running command: /bin/chown -R ceph:ceph /dev/dm-5
Running command: /bin/ln -s /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d /var/lib/ceph/osd/ceph-5/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-5/activate.monmap
 stderr: 2022-07-30T13:02:59.414+0200 7fdc3be9c700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-07-30T13:02:59.414+0200 7fdc3be9c700 -1 AuthRegistry(0x7fdc3405fb20) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: got monmap epoch 4
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-5/keyring --create-keyring --name osd.5 --add-key AQDiD+ViI6QkEBAAdJYMkSCuSzI8yP1dZF9ZLg==
 stdout: creating /var/lib/ceph/osd/ceph-5/keyring
added entity osd.5 auth(key=AQDiD+ViI6QkEBAAdJYMkSCuSzI8yP1dZF9ZLg==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 5 --monmap /var/lib/ceph/osd/ceph-5/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-5/ --osd-uuid 1e35b7d8-afce-4e59-8ff6-0063a899b28d --setuser ceph --setgroup ceph
 stderr: 2022-07-30T13:02:59.710+0200 7efeca3ba240 -1 bluestore(/var/lib/ceph/osd/ceph-5/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/sda
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d --path /var/lib/ceph/osd/ceph-5 --no-mon-config
stderr: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 547: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.5 --yes-i-really-mean-it
 stderr: 2022-07-30T13:03:15.874+0200 7fdcaeb3d700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-07-30T13:03:15.874+0200 7fdcaeb3d700 -1 AuthRegistry(0x7fdca805fb20) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: purged osd.5
--> Zapping: /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d
--> Unmounting /var/lib/ceph/osd/ceph-5
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-5
 stderr: umount: /var/lib/ceph/osd/ceph-5 unmounted
Running command: /bin/dd if=/dev/zero of=/dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0748171 s, 140 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
Running command: vgremove -v -f ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
 stderr: Removing ceph--6fd8aaa5--8d07--4d72--80ae--61730f49c714-osd--block--1e35b7d8--afce--4e59--8ff6--0063a899b28d (253:5)
 stderr: Archiving volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" metadata (seqno 5).
 stderr: Releasing logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" (seqno 6).
 stdout: Logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d" successfully removed
 stderr: Removing physical volume "/dev/sda" from volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714"
 stdout: Volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" successfully removed
Running command: pvremove -v -f -f /dev/sda
 stdout: Labels on physical volume "/dev/sda" successfully wiped.
--> Zapping successful for OSD: 5
-->  RuntimeError: command returned non-zero exit status: 127
command 'ceph-volume lvm create --cluster-fsid 3e57a563-8498-433c-9d2a-287c9aa6e910 --data /dev/sda' failed: exit code 1

So - what are your thoughts on how to proceed -> reinstall ceph all together? Or can I "fix" someting?

Any help & hint is highly appreciated ;-)

Kind regards
lucentwolf
 
Hi,
Hi all

Prologue:
  • I had a unresponsive node (let's call it #6) which I could ping; the node's osd was up and in; however I could not ssh into it (err: "broken pipe" directly after entering the password).
  • So i turned it off; then on. It booted, however it's osd did not start
  • Next I updated all nodes, however #6 complained that the ceph.list was missing final new line.
  • cat on .../ceph.list showed it contained unreadable garbage; so I deleted it and copied the contents of ceph.list from another node onto #6
  • apt dist-upgrade on #6 then ran without error messages; however after reboot the osd was still down and out.
  • ...destroyed the osd on #6 -> no errors
  • created new osd on #6 -> fail with exit code 1
So I started a shell to see the output of pveceph osd create /dev/sda:.

I spotted the following line in the command's output:
stderr: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 547: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
sounds to me like there is some form of corruption happening on that node. Please check the disks and memory. I'd also check /var/log/syslog for further errors. If the hardware seem fine, you can try apt install debsums and run debsums -s to see if there's any corrupted files in the installed packages.
Total output of pvecpeh osd create /dev/sda
Code:
root@n06:~# pveceph osd create /dev/sda
create OSD on /dev/sda (bluestore)
wiping block device /dev/sda
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.934404 s, 224 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 1e35b7d8-afce-4e59-8ff6-0063a899b28d
Running command: vgcreate --force --yes ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714 /dev/sda
 stdout: Physical volume "/dev/sda" successfully created.
 stdout: Volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" successfully created
Running command: lvcreate --yes -l 3814911 -n osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
 stdout: Logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-5
--> Executable selinuxenabled not in PATH: /sbin:/bin:/usr/sbin:/usr/bin
Running command: /bin/chown -h ceph:ceph /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d
Running command: /bin/chown -R ceph:ceph /dev/dm-5
Running command: /bin/ln -s /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d /var/lib/ceph/osd/ceph-5/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-5/activate.monmap
 stderr: 2022-07-30T13:02:59.414+0200 7fdc3be9c700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-07-30T13:02:59.414+0200 7fdc3be9c700 -1 AuthRegistry(0x7fdc3405fb20) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: got monmap epoch 4
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-5/keyring --create-keyring --name osd.5 --add-key AQDiD+ViI6QkEBAAdJYMkSCuSzI8yP1dZF9ZLg==
 stdout: creating /var/lib/ceph/osd/ceph-5/keyring
added entity osd.5 auth(key=AQDiD+ViI6QkEBAAdJYMkSCuSzI8yP1dZF9ZLg==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 5 --monmap /var/lib/ceph/osd/ceph-5/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-5/ --osd-uuid 1e35b7d8-afce-4e59-8ff6-0063a899b28d --setuser ceph --setgroup ceph
 stderr: 2022-07-30T13:02:59.710+0200 7efeca3ba240 -1 bluestore(/var/lib/ceph/osd/ceph-5/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/sda
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d --path /var/lib/ceph/osd/ceph-5 --no-mon-config
stderr: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 547: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.5 --yes-i-really-mean-it
 stderr: 2022-07-30T13:03:15.874+0200 7fdcaeb3d700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-07-30T13:03:15.874+0200 7fdcaeb3d700 -1 AuthRegistry(0x7fdca805fb20) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: purged osd.5
--> Zapping: /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d
--> Unmounting /var/lib/ceph/osd/ceph-5
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-5
 stderr: umount: /var/lib/ceph/osd/ceph-5 unmounted
Running command: /bin/dd if=/dev/zero of=/dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0748171 s, 140 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
Running command: vgremove -v -f ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
 stderr: Removing ceph--6fd8aaa5--8d07--4d72--80ae--61730f49c714-osd--block--1e35b7d8--afce--4e59--8ff6--0063a899b28d (253:5)
 stderr: Archiving volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" metadata (seqno 5).
 stderr: Releasing logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" (seqno 6).
 stdout: Logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d" successfully removed
 stderr: Removing physical volume "/dev/sda" from volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714"
 stdout: Volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" successfully removed
Running command: pvremove -v -f -f /dev/sda
 stdout: Labels on physical volume "/dev/sda" successfully wiped.
--> Zapping successful for OSD: 5
-->  RuntimeError: command returned non-zero exit status: 127
command 'ceph-volume lvm create --cluster-fsid 3e57a563-8498-433c-9d2a-287c9aa6e910 --data /dev/sda' failed: exit code 1

So - what are your thoughts on how to proceed -> reinstall ceph all together? Or can I "fix" someting?

Any help & hint is highly appreciated ;-)

Kind regards
lucentwolf
 
Fiona

tx a lot for the debsums -s hint! Indeed it reported quite a lot of changed files on the affected node. Amoung them several ceph packages; as well as pve kernel modules.

After checking syslog I also see I/O Erros on the node's system msata-ssd. Looks like the node will receive a new ssd and undergo a fresh setup...

Kind regards
lucentwolf
 
  • Like
Reactions: fiona

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!