Ceph Cluster Reinstallation - OSD's down?

Discussion in 'Proxmox VE: Installation and configuration' started by vispa, Mar 22, 2018.

  1. vispa

    vispa New Member

    Joined:
    Feb 20, 2016
    Messages:
    21
    Likes Received:
    0
    Hi All,

    I've re-installed a 5 node cluster with 5.1. Each of the 5 nodes has 8 drives;

    /dev/sda (OS)
    /dev/sdb (journal ssd)

    Then six SSD disks for OSD's.
    /dev/cciss/c0d0
    /dev/cciss/c0d1
    /dev/cciss/c0d2
    /dev/cciss/c0d3
    /dev/cciss/c0d4
    /dev/cciss/c0d5

    I've installed ceph along with the monitors and all seems to be running smoothly.

    As the nodes/disks were used in a previous installation, i've zapped them as follows

    Code:
    ceph-volume lvm zap /dev/cciss/c0d0
    ceph-volume lvm zap /dev/cciss/c0d1
    ceph-volume lvm zap /dev/cciss/c0d2
    ceph-volume lvm zap /dev/cciss/c0d3
    ceph-volume lvm zap /dev/cciss/c0d4
    ceph-volume lvm zap /dev/cciss/c0d5
    I've continued to add the osd as follows :-

    Code:
    pveceph createosd /dev/cciss/c0d1 -wal_dev /dev/sdb
    create OSD on /dev/cciss/c0d1 (bluestore)
    using device '/dev/sdb' for block.wal
    Creating new GPT entries.
    GPT data structures destroyed! You may now partition the disk using fdisk or
    other utilities.
    Creating new GPT entries.
    The operation has completed successfully.
    Setting name!
    partNum is 0
    REALLY setting name!
    The operation has completed successfully.
    prepare_device: OSD will not be hot-swappable if block.wal is not the same device as the osd data
    Setting name!
    partNum is 1
    REALLY setting name!
    The operation has completed successfully.
    The operation has completed successfully.
    Setting name!
    partNum is 1
    REALLY setting name!
    The operation has completed successfully.
    The operation has completed successfully.
    meta-data=/dev/cciss/c0d1p1      isize=2048   agcount=4, agsize=6400 blks
             =                       sectsz=512   attr=2, projid32bit=1
             =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
    data     =                       bsize=4096   blocks=25600, imaxpct=25
             =                       sunit=0      swidth=0 blks
    naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
    log      =internal log           bsize=4096   blocks=864, version=2
             =                       sectsz=512   sunit=0 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0
    The operation has completed successfully.
    I've done this for all of the OSD disks however they appear down :-

    Code:
    root@cloud1:~# ceph osd tree
    ID CLASS WEIGHT TYPE NAME       STATUS REWEIGHT PRI-AFF
    -1            0 root default                         
    -3            0     host cloud1                       
    -5            0     host cloud2                       
     0            0 osd.0             down        0 1.00000
     1            0 osd.1             down        0 1.00000
     2            0 osd.2             down        0 1.00000
     3            0 osd.3             down        0 1.00000
     4            0 osd.4             down        0 1.00000
     5            0 osd.5             down        0 1.00000 
    The OSD logs show the following error:

    Code:
    2018-03-21 21:51:03.690548 7f2a4b16ce00  0 set uid:gid to 64045:64045 (ceph:ceph)
    2018-03-21 21:51:03.690573 7f2a4b16ce00  0 ceph version 12.2.4 (4832b6f0acade977670a37c20ff5dbe69e727416) luminous (stable), process (unknown), pid 3738
    2018-03-21 21:51:03.695444 7f2a4b16ce00  1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) mkfs path /var/lib/ceph/tmp/mnt.Pftsnp
    2018-03-21 21:51:03.696343 7f2a4b16ce00  1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) mkfs already created
    2018-03-21 21:51:03.696352 7f2a4b16ce00  1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) _fsck repair (shallow) start
    2018-03-21 21:51:03.696414 7f2a4b16ce00  1 bdev create path /var/lib/ceph/tmp/mnt.Pftsnp/block type kernel
    2018-03-21 21:51:03.696428 7f2a4b16ce00  1 bdev(0x557c3cc58b40 /var/lib/ceph/tmp/mnt.Pftsnp/block) open path /var/lib/ceph/tmp/mnt.Pftsnp/block
    2018-03-21 21:51:03.696730 7f2a4b16ce00  1 bdev(0x557c3cc58b40 /var/lib/ceph/tmp/mnt.Pftsnp/block) open size 499968380928 (0x7468701000, 465 GB) block_size 4096 (4096 B) non-rotational
    2018-03-21 21:51:03.697090 7f2a4b16ce00 -1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.Pftsnp/block fsid 99fbc909-a02f-48b4-a524-1aa8c0dfbfe4 does not match our fsid d00df8a3-02d4-4ff0-941b-96f83ab6c29e
    2018-03-21 21:51:03.697103 7f2a4b16ce00  1 bdev(0x557c3cc58b40 /var/lib/ceph/tmp/mnt.Pftsnp/block) close
    2018-03-21 21:51:03.983570 7f2a4b16ce00 -1 bluestore(/var/lib/ceph/tmp/mnt.Pftsnp) mkfs fsck found fatal error: (5) Input/output error
    2018-03-21 21:51:03.983595 7f2a4b16ce00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
    2018-03-21 21:51:03.983698 7f2a4b16ce00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.Pftsnp: (5) Input/output error
    2018-03-21 21:51:05.416583 7fafd30a4e00  0 set uid:gid to 64045:64045 (ceph:ceph)
    2018-03-21 21:51:05.416606 7fafd30a4e00  0 ceph version 12.2.4 (4832b6f0acade977670a37c20ff5dbe69e727416) luminous (stable), process (unknown), pid 3803
    2018-03-21 21:51:05.421547 7fafd30a4e00  1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) mkfs path /var/lib/ceph/tmp/mnt.GUjhT4
    2018-03-21 21:51:05.422452 7fafd30a4e00  1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) mkfs already created
    2018-03-21 21:51:05.422466 7fafd30a4e00  1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) _fsck repair (shallow) start
    2018-03-21 21:51:05.422522 7fafd30a4e00  1 bdev create path /var/lib/ceph/tmp/mnt.GUjhT4/block type kernel
    2018-03-21 21:51:05.422537 7fafd30a4e00  1 bdev(0x558cda352b40 /var/lib/ceph/tmp/mnt.GUjhT4/block) open path /var/lib/ceph/tmp/mnt.GUjhT4/block
    2018-03-21 21:51:05.422837 7fafd30a4e00  1 bdev(0x558cda352b40 /var/lib/ceph/tmp/mnt.GUjhT4/block) open size 499968380928 (0x7468701000, 465 GB) block_size 4096 (4096 B) non-rotational
    2018-03-21 21:51:05.423261 7fafd30a4e00 -1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.GUjhT4/block fsid 99fbc909-a02f-48b4-a524-1aa8c0dfbfe4 does not match our fsid d00df8a3-02d4-4ff0-941b-96f83ab6c29e
    2018-03-21 21:51:05.423275 7fafd30a4e00  1 bdev(0x558cda352b40 /var/lib/ceph/tmp/mnt.GUjhT4/block) close
    2018-03-21 21:51:05.723492 7fafd30a4e00 -1 bluestore(/var/lib/ceph/tmp/mnt.GUjhT4) mkfs fsck found fatal error: (5) Input/output error
    2018-03-21 21:51:05.723520 7fafd30a4e00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
    2018-03-21 21:51:05.723623 7fafd30a4e00 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.GUjhT4: (5) Input/output error
    
    I've tried several times and can't seem to successfully add the OSD when specifying /dev/sdb/ as the journal disk.

    Can anyone see where I am going wrong?
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,027
    Likes Received:
    175
    You are using a RAID controller, some old HP, I guess. This can already be the culprit, set (if possible) your controller to IT mode (different firmware may be required).

    As a recommendation, use the available SSD as OSD and not as journal. With bluestore there is no double write penalty, as with filestore and a performance gain is usually very small (or non existent).
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. vispa

    vispa New Member

    Joined:
    Feb 20, 2016
    Messages:
    21
    Likes Received:
    0
    Hi Alwin, thanks.

    I did manage to overcome the problem by removing the OSD's, zapping / dd then re-adding. Strange that it didnt work first time around though.

    Code:
    ceph osd out 0
    service ceph stop osd.0
    ceph osd crush remove osd.0
    ceph auth del osd.0
    ceph osd rm 0
    
    ceph-disk zap /dev/cciss/c0d0
    dd if=/dev/zero of=/dev/cciss/c0d0 bs=1024 count=1
    pveceph createosd /dev/cciss/c0d0 -wal_dev /dev/sdb
     
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,027
    Likes Received:
    175
    If they have been already used as OSDs, then it may not be enough to zap them. Also ceph-volume and ceph-disk are two different tools.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice