Ceph Cluster Reinstallation - OSD's down?

vispa

Well-Known Member
Feb 20, 2016
34
0
46
42
Hi All,

I've re-installed a 5 node cluster with 5.1. Each of the 5 nodes has 8 drives;

/dev/sda (OS)
/dev/sdb (journal ssd)

Then six SSD disks for OSD's.
/dev/cciss/c0d0
/dev/cciss/c0d1
/dev/cciss/c0d2
/dev/cciss/c0d3
/dev/cciss/c0d4
/dev/cciss/c0d5

I've installed ceph along with the monitors and all seems to be running smoothly.

As the nodes/disks were used in a previous installation, i've zapped them as follows

Code:
ceph-volume lvm zap /dev/cciss/c0d0
ceph-volume lvm zap /dev/cciss/c0d1
ceph-volume lvm zap /dev/cciss/c0d2
ceph-volume lvm zap /dev/cciss/c0d3
ceph-volume lvm zap /dev/cciss/c0d4
ceph-volume lvm zap /dev/cciss/c0d5

I've continued to add the osd as follows :-

Code:
pveceph createosd /dev/cciss/c0d1 -wal_dev /dev/sdb
create OSD on /dev/cciss/c0d1 (bluestore)
using device '/dev/sdb' for block.wal
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
prepare_device: OSD will not be hot-swappable if block.wal is not the same device as the osd data
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/cciss/c0d1p1      isize=2048   agcount=4, agsize=6400 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=864, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
The operation has completed successfully.

I've done this for all of the OSD disks however they appear down :-

Code:
root@cloud1:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME       STATUS REWEIGHT PRI-AFF
-1            0 root default                         
-3            0     host cloud1                       
-5            0     host cloud2                       
 0            0 osd.0             down        0 1.00000
 1            0 osd.1             down        0 1.00000
 2            0 osd.2             down        0 1.00000
 3            0 osd.3             down        0 1.00000
 4            0 osd.4             down        0 1.00000
 5            0 osd.5             down        0 1.00000

The OSD logs show the following error:

Code:
2018-03-21 21:51:03.690548 7f2a4b16ce00  0 set uid:gid to 64045:64045 (ceph:ceph)
2018-03-21 21:51:03.690573 7f2a4b16ce00  0 ceph version 12.2.4 (4832b6f0acade977670a37c20ff5dbe69e727416) luminous (stable), process (unknown), pid 3738
2018-03-21 21:51:03.695444 7f2a4b16ce00  1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) mkfs path /var/lib/ceph/tmp/mnt.Pftsnp
2018-03-21 21:51:03.696343 7f2a4b16ce00  1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) mkfs already created
2018-03-21 21:51:03.696352 7f2a4b16ce00  1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) _fsck repair (shallow) start
2018-03-21 21:51:03.696414 7f2a4b16ce00  1 bdev create path /var/lib/ceph/tmp/mnt.Pftsnp/block type kernel
2018-03-21 21:51:03.696428 7f2a4b16ce00  1 bdev(0x557c3cc58b40 /var/lib/ceph/tmp/mnt.Pftsnp/block) open path /var/lib/ceph/tmp/mnt.Pftsnp/block
2018-03-21 21:51:03.696730 7f2a4b16ce00  1 bdev(0x557c3cc58b40 /var/lib/ceph/tmp/mnt.Pftsnp/block) open size 499968380928 (0x7468701000, 465 GB) block_size 4096 (4096 B) non-rotational
2018-03-21 21:51:03.697090 7f2a4b16ce00 -1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.Pftsnp/block fsid 99fbc909-a02f-48b4-a524-1aa8c0dfbfe4 does not match our fsid d00df8a3-02d4-4ff0-941b-96f83ab6c29e
2018-03-21 21:51:03.697103 7f2a4b16ce00  1 bdev(0x557c3cc58b40 /var/lib/ceph/tmp/mnt.Pftsnp/block) close
2018-03-21 21:51:03.983570 7f2a4b16ce00 -1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) mkfs fsck found fatal error: (5) Input/output error
2018-03-21 21:51:03.983595 7f2a4b16ce00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
2018-03-21 21:51:03.983698 7f2a4b16ce00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.Pftsnp: (5) Input/output error
2018-03-21 21:51:05.416583 7fafd30a4e00  0 set uid:gid to 64045:64045 (ceph:ceph)
2018-03-21 21:51:05.416606 7fafd30a4e00  0 ceph version 12.2.4 (4832b6f0acade977670a37c20ff5dbe69e727416) luminous (stable), process (unknown), pid 3803
2018-03-21 21:51:05.421547 7fafd30a4e00  1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) mkfs path /var/lib/ceph/tmp/mnt.GUjhT4
2018-03-21 21:51:05.422452 7fafd30a4e00  1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) mkfs already created
2018-03-21 21:51:05.422466 7fafd30a4e00  1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) _fsck repair (shallow) start
2018-03-21 21:51:05.422522 7fafd30a4e00  1 bdev create path /var/lib/ceph/tmp/mnt.GUjhT4/block type kernel
2018-03-21 21:51:05.422537 7fafd30a4e00  1 bdev(0x558cda352b40 /var/lib/ceph/tmp/mnt.GUjhT4/block) open path /var/lib/ceph/tmp/mnt.GUjhT4/block
2018-03-21 21:51:05.422837 7fafd30a4e00  1 bdev(0x558cda352b40 /var/lib/ceph/tmp/mnt.GUjhT4/block) open size 499968380928 (0x7468701000, 465 GB) block_size 4096 (4096 B) non-rotational
2018-03-21 21:51:05.423261 7fafd30a4e00 -1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.GUjhT4/block fsid 99fbc909-a02f-48b4-a524-1aa8c0dfbfe4 does not match our fsid d00df8a3-02d4-4ff0-941b-96f83ab6c29e
2018-03-21 21:51:05.423275 7fafd30a4e00  1 bdev(0x558cda352b40 /var/lib/ceph/tmp/mnt.GUjhT4/block) close
2018-03-21 21:51:05.723492 7fafd30a4e00 -1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) mkfs fsck found fatal error: (5) Input/output error
2018-03-21 21:51:05.723520 7fafd30a4e00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
2018-03-21 21:51:05.723623 7fafd30a4e00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.GUjhT4: (5) Input/output error

I've tried several times and can't seem to successfully add the OSD when specifying /dev/sdb/ as the journal disk.

Can anyone see where I am going wrong?
 
/dev/cciss/c0d0
You are using a RAID controller, some old HP, I guess. This can already be the culprit, set (if possible) your controller to IT mode (different firmware may be required).

As a recommendation, use the available SSD as OSD and not as journal. With bluestore there is no double write penalty, as with filestore and a performance gain is usually very small (or non existent).
 
Hi Alwin, thanks.

I did manage to overcome the problem by removing the OSD's, zapping / dd then re-adding. Strange that it didnt work first time around though.

Code:
ceph osd out 0
service ceph stop osd.0
ceph osd crush remove osd.0
ceph auth del osd.0
ceph osd rm 0

ceph-disk zap /dev/cciss/c0d0
dd if=/dev/zero of=/dev/cciss/c0d0 bs=1024 count=1
pveceph createosd /dev/cciss/c0d0 -wal_dev /dev/sdb
 
If they have been already used as OSDs, then it may not be enough to zap them. Also ceph-volume and ceph-disk are two different tools.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!