Hi all
Prologue:
I spotted the following line in the command's output:
Total output of pvecpeh osd create /dev/sda
So - what are your thoughts on how to proceed -> reinstall ceph all together? Or can I "fix" someting?
Any help & hint is highly appreciated ;-)
Kind regards
lucentwolf
Prologue:
- I had a unresponsive node (let's call it #6) which I could ping; the node's osd was up and in; however I could not ssh into it (err: "broken pipe" directly after entering the password).
- So i turned it off; then on. It booted, however it's osd did not start
- Next I updated all nodes, however #6 complained that the ceph.list was missing final new line.
- cat on .../ceph.list showed it contained unreadable garbage; so I deleted it and copied the contents of ceph.list from another node onto #6
- apt dist-upgrade on #6 then ran without error messages; however after reboot the osd was still down and out.
- ...destroyed the osd on #6 -> no errors
- created new osd on #6 -> fail with exit code 1
pveceph osd create /dev/sda
:.I spotted the following line in the command's output:
stderr: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 547: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
Total output of pvecpeh osd create /dev/sda
Code:
root@n06:~# pveceph osd create /dev/sda
create OSD on /dev/sda (bluestore)
wiping block device /dev/sda
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.934404 s, 224 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 1e35b7d8-afce-4e59-8ff6-0063a899b28d
Running command: vgcreate --force --yes ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714 /dev/sda
stdout: Physical volume "/dev/sda" successfully created.
stdout: Volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" successfully created
Running command: lvcreate --yes -l 3814911 -n osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
stdout: Logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-5
--> Executable selinuxenabled not in PATH: /sbin:/bin:/usr/sbin:/usr/bin
Running command: /bin/chown -h ceph:ceph /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d
Running command: /bin/chown -R ceph:ceph /dev/dm-5
Running command: /bin/ln -s /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d /var/lib/ceph/osd/ceph-5/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-5/activate.monmap
stderr: 2022-07-30T13:02:59.414+0200 7fdc3be9c700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-07-30T13:02:59.414+0200 7fdc3be9c700 -1 AuthRegistry(0x7fdc3405fb20) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: got monmap epoch 4
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-5/keyring --create-keyring --name osd.5 --add-key AQDiD+ViI6QkEBAAdJYMkSCuSzI8yP1dZF9ZLg==
stdout: creating /var/lib/ceph/osd/ceph-5/keyring
added entity osd.5 auth(key=AQDiD+ViI6QkEBAAdJYMkSCuSzI8yP1dZF9ZLg==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 5 --monmap /var/lib/ceph/osd/ceph-5/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-5/ --osd-uuid 1e35b7d8-afce-4e59-8ff6-0063a899b28d --setuser ceph --setgroup ceph
stderr: 2022-07-30T13:02:59.710+0200 7efeca3ba240 -1 bluestore(/var/lib/ceph/osd/ceph-5/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/sda
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d --path /var/lib/ceph/osd/ceph-5 --no-mon-config
stderr: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 547: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.5 --yes-i-really-mean-it
stderr: 2022-07-30T13:03:15.874+0200 7fdcaeb3d700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2022-07-30T13:03:15.874+0200 7fdcaeb3d700 -1 AuthRegistry(0x7fdca805fb20) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: purged osd.5
--> Zapping: /dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d
--> Unmounting /var/lib/ceph/osd/ceph-5
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-5
stderr: umount: /var/lib/ceph/osd/ceph-5 unmounted
Running command: /bin/dd if=/dev/zero of=/dev/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714/osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d bs=1M count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0748171 s, 140 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
Running command: vgremove -v -f ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714
stderr: Removing ceph--6fd8aaa5--8d07--4d72--80ae--61730f49c714-osd--block--1e35b7d8--afce--4e59--8ff6--0063a899b28d (253:5)
stderr: Archiving volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" metadata (seqno 5).
stderr: Releasing logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d"
stderr: Creating volume group backup "/etc/lvm/backup/ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" (seqno 6).
stdout: Logical volume "osd-block-1e35b7d8-afce-4e59-8ff6-0063a899b28d" successfully removed
stderr: Removing physical volume "/dev/sda" from volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714"
stdout: Volume group "ceph-6fd8aaa5-8d07-4d72-80ae-61730f49c714" successfully removed
Running command: pvremove -v -f -f /dev/sda
stdout: Labels on physical volume "/dev/sda" successfully wiped.
--> Zapping successful for OSD: 5
--> RuntimeError: command returned non-zero exit status: 127
command 'ceph-volume lvm create --cluster-fsid 3e57a563-8498-433c-9d2a-287c9aa6e910 --data /dev/sda' failed: exit code 1
So - what are your thoughts on how to proceed -> reinstall ceph all together? Or can I "fix" someting?
Any help & hint is highly appreciated ;-)
Kind regards
lucentwolf