[SOLVED] could not add ceph osd

huky · Mar 21, 2020

I have a ceph cluster with 6 node. now I want to add osd but met problem.
I had use the comand sucess on other ceph cluster，but wrong now:

# ceph-volume lvm create --bluestore --data $DEV --block.wal /dev/nvme0n1p5 --block.db /dev/nvme0n1p11
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 122647d6-ba35-4f72-9e67-82f56d86caa3
stderr: Error EEXIST: entity osd.12 exists but key does not match
--> RuntimeError: Unable to create a new OSD id

then I test

# ceph-volume lvm create --bluestore --data $DEV --block.wal /dev/nvme0n1p5 --block.db /dev/nvme0n1p11 --osd-id 12
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
--> RuntimeError: The osd ID 12 is already in use or does not exist.

at last, i use

# pveceph createosd /dev/sde -bluestore -wal_dev /dev/nvme0n1p5 -journal_dev /dev/nvme0n1p11
create OSD on /dev/sde (bluestore)
using device '/dev/nvme0n1p11' for block.db
using device '/dev/nvme0n1p5' for block.wal
wipe disk/partition: /dev/sde
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.06833 s, 196 MB/s
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
prepare_device: OSD will not be hot-swappable if block.db is not the same device as the osd data
prepare_device: Block.db /dev/nvme0n1p11 was not prepared with ceph-disk. Symlinking directly.
prepare_device: OSD will not be hot-swappable if block.wal is not the same device as the osd data
prepare_device: Block.wal /dev/nvme0n1p5 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/sde1 isize=2048 agcount=4, agsize=6400 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0
data = bsize=4096 blocks=25600, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=864, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

and

26404836 0 26404836 0% /run/user/0
root@ynode001:~# sgdisk --print /dev/sde
Disk /dev/sde: 7814037168 sectors, 3.6 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): CAED6C82-405D-4735-9FF7-ADAD91DBF844
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 7814037134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number Start (sector) End (sector) Size Code Name
1 2048 206847 100.0 MiB F800 ceph data
2 206848 7814037134 3.6 TiB FFFF ceph block

It is OK. but I could not found the new OSD ID

it 's still

# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 43.66315 root default
-3 14.55438 host ynode001
0 hdd 3.63860 osd.0 up 1.00000 1.00000
1 hdd 3.63860 osd.1 up 1.00000 1.00000
2 hdd 3.63860 osd.2 up 1.00000 1.00000
3 hdd 3.63860 osd.3 up 1.00000 1.00000
-5 14.55438 host ynode003
4 hdd 3.63860 osd.4 up 1.00000 1.00000
5 hdd 3.63860 osd.5 up 1.00000 1.00000
6 hdd 3.63860 osd.6 up 1.00000 1.00000
7 hdd 3.63860 osd.7 up 1.00000 1.00000
-7 14.55438 host ynode005
8 hdd 3.63860 osd.8 up 1.00000 1.00000
9 hdd 3.63860 osd.9 up 1.00000 1.00000
10 hdd 3.63860 osd.10 up 1.00000 1.00000
11 hdd 3.63860 osd.11 up 1.00000 1.00000

the service failed

# systemctl status ceph-disk@dev-sde1.service
● ceph-disk@dev-sde1.service - Ceph disk activation: /dev/sde1
Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-disk@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Sat 2020-03-21 20:31:03 CST; 12h ago
Process: 25831 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock /var/lock/ceph-disk-$(basename /dev/sde1) /usr/sbin/ceph-disk --verbose --log-stdout trigger --syn
Main PID: 25831 (code=exited, status=1/FAILURE)

Mar 21 20:31:03 ynode001 sh[25831]: main(sys.argv[1:])
Mar 21 20:31:03 ynode001 sh[25831]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5687, in main
Mar 21 20:31:03 ynode001 sh[25831]: args.func(args)
Mar 21 20:31:03 ynode001 sh[25831]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4890, in main_trigger
Mar 21 20:31:03 ynode001 sh[25831]: raise Error('return code ' + str(ret))
Mar 21 20:31:03 ynode001 sh[25831]: ceph_disk.main.Error: Error: return code 1
Mar 21 20:31:03 ynode001 systemd[1]: ceph-disk@dev-sde1.service: Main process exited, code=exited, status=1/FAILURE
Mar 21 20:31:03 ynode001 systemd[1]: Failed to start Ceph disk activation: /dev/sde1.
Mar 21 20:31:03 ynode001 systemd[1]: ceph-disk@dev-sde1.service: Unit entered failed state.
Mar 21 20:31:03 ynode001 systemd[1]: ceph-disk@dev-sde1.service: Failed with result 'exit-code'.

and /var/log/ceph/ceph-volume.log

[2020-03-21 20:25:54,738][ceph_volume.process][INFO ] stdout TAGS=:systemd:
[2020-03-21 20:25:54,738][ceph_volume.process][INFO ] stdout USEC_INITIALIZED=2975831445
[2020-03-21 20:25:54,741][ceph_volume.util.system][INFO ] /dev/sde1 was not found as mounted
[2020-03-21 20:25:54,741][ceph_volume.process][INFO ] Running command: /sbin/wipefs --all /dev/sde1
[2020-03-21 20:25:54,755][ceph_volume.process][INFO ] stdout /dev/sde1: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
[2020-03-21 20:25:54,755][ceph_volume.process][INFO ] Running command: dd if=/dev/zero of=/dev/sde1 bs=1M count=10
[2020-03-21 20:25:54,834][ceph_volume.process][INFO ] stderr 10+0 records in

i can mount /dev/sde1 manuly

huky · Mar 23, 2020

I restart three node then it work
ceph-volume lvm create --bluestore --data $DEV --block.wal /dev/nvme0n1p5 --block.db /dev/nvme0n1p11

Search

Search

[SOLVED] could not add ceph osd

huky

Well-Known Member

huky

Well-Known Member