[Ceph] unable to run OSDs

nicklock · Oct 22, 2018

My apologies in advance for the length of this post!

During a new hardware install, our Ceph node/server is:

Dell PowerEdge R7415:
1x AMD EPYC 7251 8-Core Processor
128GB RAM
HBA330 disk controller (LSI/Broadcom SAS3008, running FW 15.17.09.06 in IT mode)
4x Toshiba THNSF8200CCS 200GB SSD
8x SEAGATE ST8000NM0195 HDD (for OSDs)

Having once again followed the instructions for creating a Ceph cluster here https://pve.proxmox.com/wiki/Manage_Ceph_Services_on_Proxmox_VE_Nodes , after running "pveceph createosd" on the 8 HDDs, only 3 of the OSDs started and came online. After purging the Ceph OSD configuration, I am now unable to create and start ANY OSDs:

Code:

root@ceph1m-2:/var/log# ceph osd tree
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
-1            0 root default                       

root@ceph1m-2:/var/log# dd if=/dev/zero of=/dev/sde bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.99159 s, 211 MB/s

root@ceph1m-2:/var/log# ceph-disk zap /dev/sde
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.

root@ceph1m-2:/var/log# date
Mon Oct 22 16:48:44 BST 2018

root@ceph1m-2:/var/log# ceph osd tree
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
-1            0 root default                       
root@ceph1m-2:/var/log# pveceph createosd /dev/sde
create OSD on /dev/sde (bluestore)
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/sde1              isize=2048   agcount=4, agsize=6400 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

root@ceph1m-2:/var/log# ceph osd tree
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
-1            0 root default                       
 0            0 osd.0          down        0 1.00000

root@ceph1m-2:/var/log# ps axf | grep osd
  16443 pts/0    S+     0:00          \_ grep osd

The syslog shows tracebacks related to Bluestore:

Code:

root@ceph1m-2:/var/log# tail -n 180 syslog
Oct 22 16:49:10 ceph1m-2 sh[15999]: subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'0', '--monmap', '/var/lib/ceph/tmp/mnt.aEj6r_/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.aEj6r_', '--osd-uuid', u'34bad71e-0cd5-48b7-be79-1e8d4a0cb81e', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status -6
Oct 22 16:49:10 ceph1m-2 sh[15999]: Traceback (most recent call last):
Oct 22 16:49:10 ceph1m-2 sh[15999]:   File "/usr/sbin/ceph-disk", line 11, in <module>
Oct 22 16:49:10 ceph1m-2 sh[15999]:     load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Oct 22 16:49:10 ceph1m-2 sh[15999]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5736, in run
Oct 22 16:49:10 ceph1m-2 sh[15999]:     main(sys.argv[1:])
Oct 22 16:49:10 ceph1m-2 sh[15999]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5687, in main
Oct 22 16:49:10 ceph1m-2 sh[15999]:     args.func(args)
Oct 22 16:49:10 ceph1m-2 sh[15999]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4890, in main_trigger
Oct 22 16:49:10 ceph1m-2 sh[15999]:     raise Error('return code ' + str(ret))
Oct 22 16:49:10 ceph1m-2 sh[15999]: ceph_disk.main.Error: Error: return code 1
Oct 22 16:49:10 ceph1m-2 systemd[1]: Failed to start Ceph disk activation: /dev/sde2.
Oct 22 16:49:10 ceph1m-2 systemd[1]: ceph-disk@dev-sde2.service: Unit entered failed state.
Oct 22 16:49:10 ceph1m-2 systemd[1]: ceph-disk@dev-sde2.service: Failed with result 'exit-code'.
Oct 22 16:49:10 ceph1m-2 kernel: [ 2615.978863] XFS (sde1): Mounting V5 Filesystem
Oct 22 16:49:10 ceph1m-2 kernel: [ 2616.048749] XFS (sde1): Ending clean mount
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.349806] sd 1:0:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.349818] sd 1:0:4:0: [sde] tag#0 Sense Key : Aborted Command [current]
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.349821] sd 1:0:4:0: [sde] tag#0 Add. Sense: Logical block guard check failed
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.349824] sd 1:0:4:0: [sde] tag#0 CDB: Read(32)
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.349827] sd 1:0:4:0: [sde] tag#0 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.349829] sd 1:0:4:0: [sde] tag#0 CDB[10]: 37 e4 1d 00 37 e4 1d 00 00 00 00 00 00 00 01 00
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.349831] print_req_error: protection error, dev sde, sector 7501572096
Oct 22 16:49:11 ceph1m-2 kernel: [ 2616.467638] XFS (sde1): Unmounting Filesystem
Oct 22 16:49:11 ceph1m-2 sh[16140]: main_trigger:
Oct 22 16:49:11 ceph1m-2 sh[16140]: main_trigger: main_activate: path = /dev/sde1
Oct 22 16:49:11 ceph1m-2 sh[16140]: get_dm_uuid: get_dm_uuid /dev/sde1 uuid path is /sys/dev/block/8:65/dm/uuid
Oct 22 16:49:11 ceph1m-2 sh[16140]: command: Running command: /sbin/blkid -o udev -p /dev/sde1
Oct 22 16:49:11 ceph1m-2 sh[16140]: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/sde1
Oct 22 16:49:11 ceph1m-2 sh[16140]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
Oct 22 16:49:11 ceph1m-2 sh[16140]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
Oct 22 16:49:11 ceph1m-2 sh[16140]: mount: Mounting /dev/sde1 on /var/lib/ceph/tmp/mnt.3jaGYg with options noatime,inode64
Oct 22 16:49:11 ceph1m-2 sh[16140]: command_check_call: Running command: /bin/mount -t xfs -o noatime,inode64 -- /dev/sde1 /var/lib/ceph/tmp/mnt.3jaGYg
Oct 22 16:49:11 ceph1m-2 sh[16140]: activate: Cluster uuid is ecf4285f-7a04-4f97-b705-d0194254d317
Oct 22 16:49:11 ceph1m-2 sh[16140]: command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
Oct 22 16:49:11 ceph1m-2 sh[16140]: activate: Cluster name is ceph
Oct 22 16:49:11 ceph1m-2 sh[16140]: activate: OSD uuid is 34bad71e-0cd5-48b7-be79-1e8d4a0cb81e
Oct 22 16:49:11 ceph1m-2 sh[16140]: activate: OSD id is 0
Oct 22 16:49:11 ceph1m-2 sh[16140]: activate: Initializing OSD...
Oct 22 16:49:11 ceph1m-2 sh[16140]: command_check_call: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/tmp/mnt.3jaGYg/activate.monmap
Oct 22 16:49:11 ceph1m-2 sh[16140]: got monmap epoch 3
Oct 22 16:49:11 ceph1m-2 sh[16140]: command_check_call: Running command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 0 --monmap /var/lib/ceph/tmp/mnt.3jaGYg/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.3jaGYg --osd-uuid 34bad71e-0cd5-48b7-be79-1e8d4a0cb81e --setuser ceph --setgroup ceph
Oct 22 16:49:11 ceph1m-2 sh[16140]: /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7f40dfdb7e00 time 2018-10-22 16:49:11.131160
Oct 22 16:49:11 ceph1m-2 sh[16140]: /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5582379d2ab2]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x558237939aba]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  3: (BlueFS::_replay(bool)+0x22d) [0x55823794134d]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  4: (BlueFS::mount()+0x1e1) [0x558237945641]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  5: (BlueStore::_open_db(bool)+0x1698) [0x5582378535a8]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  6: (BlueStore::mkfs()+0xeb5) [0x55823788da55]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  7: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x5582373bc796]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  8: (main()+0x127c) [0x5582372efe2c]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  9: (__libc_start_main()+0xf1) [0x7f40dc3732e1]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  10: (_start()+0x2a) [0x55823737c84a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 22 16:49:11 ceph1m-2 sh[16140]: 2018-10-22 16:49:11.133961 7f40dfdb7e00 -1 /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7f40dfdb7e00 time 2018-10-22 16:49:11.131160
Oct 22 16:49:11 ceph1m-2 sh[16140]: /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5582379d2ab2]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x558237939aba]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  3: (BlueFS::_replay(bool)+0x22d) [0x55823794134d]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  4: (BlueFS::mount()+0x1e1) [0x558237945641]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  5: (BlueStore::_open_db(bool)+0x1698) [0x5582378535a8]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  6: (BlueStore::mkfs()+0xeb5) [0x55823788da55]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  7: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x5582373bc796]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  8: (main()+0x127c) [0x5582372efe2c]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  9: (__libc_start_main()+0xf1) [0x7f40dc3732e1]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  10: (_start()+0x2a) [0x55823737c84a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 22 16:49:11 ceph1m-2 sh[16140]:      0> 2018-10-22 16:49:11.133961 7f40dfdb7e00 -1 /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7f40dfdb7e00 time 2018-10-22 16:49:11.131160
Oct 22 16:49:11 ceph1m-2 sh[16140]: /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5582379d2ab2]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x558237939aba]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  3: (BlueFS::_replay(bool)+0x22d) [0x55823794134d]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  4: (BlueFS::mount()+0x1e1) [0x558237945641]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  5: (BlueStore::_open_db(bool)+0x1698) [0x5582378535a8]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  6: (BlueStore::mkfs()+0xeb5) [0x55823788da55]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  7: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x5582373bc796]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  8: (main()+0x127c) [0x5582372efe2c]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  9: (__libc_start_main()+0xf1) [0x7f40dc3732e1]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  10: (_start()+0x2a) [0x55823737c84a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 22 16:49:11 ceph1m-2 sh[16140]: *** Caught signal (Aborted) **
Oct 22 16:49:11 ceph1m-2 sh[16140]:  in thread 7f40dfdb7e00 thread_name:ceph-osd
Oct 22 16:49:11 ceph1m-2 sh[16140]:  ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  1: (()+0xa3bba4) [0x55823798aba4]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  2: (()+0x110c0) [0x7f40dd3be0c0]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  3: (gsignal()+0xcf) [0x7f40dc385fff]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  4: (abort()+0x16a) [0x7f40dc38742a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5582379d2c3e]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x558237939aba]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  7: (BlueFS::_replay(bool)+0x22d) [0x55823794134d]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  8: (BlueFS::mount()+0x1e1) [0x558237945641]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  9: (BlueStore::_open_db(bool)+0x1698) [0x5582378535a8]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  10: (BlueStore::mkfs()+0xeb5) [0x55823788da55]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  11: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x5582373bc796]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  12: (main()+0x127c) [0x5582372efe2c]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  13: (__libc_start_main()+0xf1) [0x7f40dc3732e1]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  14: (_start()+0x2a) [0x55823737c84a]
Oct 22 16:49:11 ceph1m-2 sh[16140]: 2018-10-22 16:49:11.136967 7f40dfdb7e00 -1 *** Caught signal (Aborted) **
Oct 22 16:49:11 ceph1m-2 sh[16140]:  in thread 7f40dfdb7e00 thread_name:ceph-osd
Oct 22 16:49:11 ceph1m-2 sh[16140]:  ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  1: (()+0xa3bba4) [0x55823798aba4]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  2: (()+0x110c0) [0x7f40dd3be0c0]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  3: (gsignal()+0xcf) [0x7f40dc385fff]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  4: (abort()+0x16a) [0x7f40dc38742a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5582379d2c3e]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x558237939aba]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  7: (BlueFS::_replay(bool)+0x22d) [0x55823794134d]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  8: (BlueFS::mount()+0x1e1) [0x558237945641]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  9: (BlueStore::_open_db(bool)+0x1698) [0x5582378535a8]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  10: (BlueStore::mkfs()+0xeb5) [0x55823788da55]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  11: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x5582373bc796]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  12: (main()+0x127c) [0x5582372efe2c]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  13: (__libc_start_main()+0xf1) [0x7f40dc3732e1]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  14: (_start()+0x2a) [0x55823737c84a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 22 16:49:11 ceph1m-2 sh[16140]:      0> 2018-10-22 16:49:11.136967 7f40dfdb7e00 -1 *** Caught signal (Aborted) **
Oct 22 16:49:11 ceph1m-2 sh[16140]:  in thread 7f40dfdb7e00 thread_name:ceph-osd
Oct 22 16:49:11 ceph1m-2 sh[16140]:  ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
Oct 22 16:49:11 ceph1m-2 sh[16140]:  1: (()+0xa3bba4) [0x55823798aba4]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  2: (()+0x110c0) [0x7f40dd3be0c0]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  3: (gsignal()+0xcf) [0x7f40dc385fff]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  4: (abort()+0x16a) [0x7f40dc38742a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5582379d2c3e]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x558237939aba]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  7: (BlueFS::_replay(bool)+0x22d) [0x55823794134d]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  8: (BlueFS::mount()+0x1e1) [0x558237945641]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  9: (BlueStore::_open_db(bool)+0x1698) [0x5582378535a8]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  10: (BlueStore::mkfs()+0xeb5) [0x55823788da55]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  11: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x5582373bc796]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  12: (main()+0x127c) [0x5582372efe2c]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  13: (__libc_start_main()+0xf1) [0x7f40dc3732e1]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  14: (_start()+0x2a) [0x55823737c84a]
Oct 22 16:49:11 ceph1m-2 sh[16140]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Oct 22 16:49:11 ceph1m-2 sh[16140]: mount_activate: Failed to activate
Oct 22 16:49:11 ceph1m-2 sh[16140]: unmount: Unmounting /var/lib/ceph/tmp/mnt.3jaGYg
Oct 22 16:49:11 ceph1m-2 sh[16140]: command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.3jaGYg
Oct 22 16:49:11 ceph1m-2 sh[16140]: Traceback (most recent call last):
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/sbin/ceph-disk", line 11, in <module>
Oct 22 16:49:11 ceph1m-2 sh[16140]:     load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5736, in run
Oct 22 16:49:11 ceph1m-2 sh[16140]:     main(sys.argv[1:])
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5687, in main
Oct 22 16:49:11 ceph1m-2 sh[16140]:     args.func(args)
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3777, in main_activate
Oct 22 16:49:11 ceph1m-2 sh[16140]:     reactivate=args.reactivate,
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3540, in mount_activate
Oct 22 16:49:11 ceph1m-2 sh[16140]:     (osd_id, cluster) = activate(path, activate_key_template, init)
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3717, in activate
Oct 22 16:49:11 ceph1m-2 sh[16140]:     keyring=keyring,
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3169, in mkfs
Oct 22 16:49:11 ceph1m-2 sh[16140]:     '--setgroup', get_ceph_group(),
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 566, in command_check_call
Oct 22 16:49:11 ceph1m-2 sh[16140]:     return subprocess.check_call(arguments)
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/subprocess.py", line 186, in check_call
Oct 22 16:49:11 ceph1m-2 sh[16140]:     raise CalledProcessError(retcode, cmd)
Oct 22 16:49:11 ceph1m-2 sh[16140]: subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'0', '--monmap', '/var/lib/ceph/tmp/mnt.3jaGYg/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.3jaGYg', '--osd-uuid', u'34bad71e-0cd5-48b7-be79-1e8d4a0cb81e', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status -6
Oct 22 16:49:11 ceph1m-2 sh[16140]: Traceback (most recent call last):
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/sbin/ceph-disk", line 11, in <module>
Oct 22 16:49:11 ceph1m-2 sh[16140]:     load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5736, in run
Oct 22 16:49:11 ceph1m-2 sh[16140]:     main(sys.argv[1:])
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5687, in main
Oct 22 16:49:11 ceph1m-2 sh[16140]:     args.func(args)
Oct 22 16:49:11 ceph1m-2 sh[16140]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4890, in main_trigger
Oct 22 16:49:11 ceph1m-2 sh[16140]:     raise Error('return code ' + str(ret))
Oct 22 16:49:11 ceph1m-2 sh[16140]: ceph_disk.main.Error: Error: return code 1
Oct 22 16:49:11 ceph1m-2 systemd[1]: ceph-disk@dev-sde1.service: Main process exited, code=exited, status=1/FAILURE
Oct 22 16:49:11 ceph1m-2 systemd[1]: Failed to start Ceph disk activation: /dev/sde1.
Oct 22 16:49:11 ceph1m-2 systemd[1]: ceph-disk@dev-sde1.service: Unit entered failed state.
Oct 22 16:49:11 ceph1m-2 systemd[1]: ceph-disk@dev-sde1.service: Failed with result 'exit-code'.
Oct 22 16:49:17 ceph1m-2 corosync[2699]: notice  [TOTEM ] Retransmit List: 3623
Oct 22 16:49:17 ceph1m-2 corosync[2699]:  [TOTEM ] Retransmit List: 3623
Oct 22 16:49:47 ceph1m-2 corosync[2699]: notice  [TOTEM ] Retransmit List: 36b3
Oct 22 16:49:47 ceph1m-2 corosync[2699]:  [TOTEM ] Retransmit List: 36b3
Oct 22 16:49:54 ceph1m-2 corosync[2699]: notice  [TOTEM ] Retransmit List: 36d3
Oct 22 16:49:54 ceph1m-2 corosync[2699]:  [TOTEM ] Retransmit List: 36d3

Stop/starting the ceph-mon service produces the same results, with the addition of these lines in syslog:

Code:

Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140167] sd 1:0:4:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140170] sd 1:0:4:0: [sde] tag#1 Sense Key : Aborted Command [current]
Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140172] sd 1:0:4:0: [sde] tag#1 Add. Sense: Logical block guard check failed
Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140174] sd 1:0:4:0: [sde] tag#1 CDB: Read(32)
Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140177] sd 1:0:4:0: [sde] tag#1 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140178] sd 1:0:4:0: [sde] tag#1 CDB[10]: 37 e4 1d 80 37 e4 1d 80 00 00 00 00 00 00 00 80
Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140180] print_req_error: protection error, dev sde, sector 7501573120
Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235159] sd 1:0:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235165] sd 1:0:4:0: [sde] tag#0 Sense Key : Aborted Command [current]
Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235168] sd 1:0:4:0: [sde] tag#0 Add. Sense: Logical block guard check failed
Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235170] sd 1:0:4:0: [sde] tag#0 CDB: Read(32)
Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235173] sd 1:0:4:0: [sde] tag#0 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235175] sd 1:0:4:0: [sde] tag#0 CDB[10]: 37 e4 1d 00 37 e4 1d 00 00 00 00 00 00 00 00 80

Stop/starting the overall ceph service just fails, with similar output.

Software versions:

Code:

root@ceph1m-2:/var/log# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-10
pve-kernel-4.15.18-7-pve: 4.15.18-27
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-28
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-36
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1

It's all a little confusing, to be honest - we've not had total failure like this in the last 3 pveceph clusters we've assembled, and I'd have thought that the newest hardware and software would be... super reliable?

Could anyone please shed some light on what might be going wrong?

Edit: Creating the OSD in Filestore mode works immediately!

Code:

root@ceph1m-2:/var/log# ceph osd tree
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
-1            0 root default                        

root@ceph1m-2:/var/log# ceph-disk zap /dev/sde
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.

root@ceph1m-2:/var/log# pveceph createosd /dev/sde -bluestore 0
create OSD on /dev/sde (xfs)
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sde1              isize=2048   agcount=8, agsize=268435455 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=1952195665, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

root@ceph1m-2:/var/log# ceph osd tree
ID CLASS WEIGHT  TYPE NAME         STATUS REWEIGHT PRI-AFF
-1       7.27060 root default                            
-3       7.27060     host ceph1m-2                        
 0   hdd 7.27060         osd.0         up  1.00000 1.00000

...

Alwin · Oct 23, 2018

Please check the forum, it has been answered numerous times.

https://forum.proxmox.com/threads/ceph-osd-issue.43989/

nicklock · Oct 23, 2018

Hello Alwin,

While I appreciate that you probably see a lot of these sort of posts, unfortunately the forum thread that you linked to does not give me any more information than I already have.

As you will see in line 5 of the first CODE block in my post, I am already using dd to wipe the OSD devices (as per the official documentation: https://pve.proxmox.com/wiki/Manage_Ceph_Services_on_Proxmox_VE_Nodes#pve_ceph_osds ) .

Additionally, although I did not make this clear before, the OSD creation process failed on approximately 15 out of 24 *FACTORY NEW* HDDs (across 3 servers) during the first attempt. Unless Dell are also supplying used 8TB HDDs in their new servers, I am forced to believe that something else is the problem?

I have retried the process again just now, using the additional "conv=fdatasync" switch on dd as suggested in your linked thread, and there is no difference. After this failure once again purging/dd/creating the OSD with '-bluestore 0' works immediately and returns "HEALTH_OK" from 'ceph status'.

Regards,
Nick.

Alwin · Oct 23, 2018

Thanks, was not reading carefully enough then.

Can you please try to use urandom instead? Maybe those drives use a same-write-pattern detection and the DD should at least write twice their cache size, as disks may ignore the sync.

nicklock · Oct 23, 2018

Hello Alwin,

I've tried again after blanking the device with urandom:

Code:

root@ceph1m-2:~# dd if=/dev/urandom of=/dev/sde bs=1M count=500 conv=fdatasync oflag=direct
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 10.6133 s, 49.4 MB/s

but the results are the same - no running OSD and tracebacks in the syslog.

Could there be some incompatibility with the HBA in use? I ask this as 'hdparm -I' on the device being tested here returns the error "SG_IO: bad/missing sense data" ('sdparm -I' returns correct information), wheras on one of our existing PVE/Ceph clusters the same 'hdparm -I' returns data... The 'new' HBA is a "LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)", and the 'old' HBA is a "LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)".

Regards,
Nick.

Alwin · Oct 23, 2018

This depends on the controller, if the requested sense code is implemented. But it can't be ruled out, if it is a broken HBA or a firmware bug. Is the most recent firmware on the controller?

nicklock · Oct 23, 2018

Hello Alwin,

Well, we have 3 servers exhibiting this behaviour so if all 3 have broken HBAs I would consider myself to be quite unlucky

A firmware bug is possible, but there is no updated firmware available. Dell call the controller "Dell HBA330 Mini (Embedded)", and we are currently using FW version 15.17.09.06 - Dell's firmware page is here ( https://www.dell.com/support/home/u...F6CM&osCode=WST14&productCode=poweredge-r7415 ) and indicates that this is the latest version.

I have no immediate way of testing another controller in these servers, and I also do not currently know if fitting another controller is even possible as I have no knowledge of the connectors on the storage backplane(!).

Edit: as the servers are remote to our office, I am having someone take photos tonight of the internal connections, in order to plan a way of testing this theory.

Regards,
Nick.

Alwin · Oct 23, 2018

nicklock said:
4x Toshiba THNSF8200CCS 200GB SSD
8x SEAGATE ST8000NM0195 HDD (for OSDs)

To bad that these are Dell branded disks, there is not much information to get online. But did you try the SSDs for OSDs? Just to rule out the 8TB disks as a problem. Interestingly though, OSDs with filestore work.

Maybe there are also some BIOS (or iDARC) settings for the controller.

nicklock · Oct 23, 2018

%@£(%*^*, I didn't think of trying that!

Bluestore OSD on an SSD works fine... and the SSDs respond correctly to 'hdparm -I'.

We're going to look very closely at the HDDs and controller. My apologies for possibly having wasted your time today; I will come back to this ticket once we have had a chance to try some hardware changes, and I will post an update on what we find.

Best Regards,
Nick.

Alwin · Oct 23, 2018

nicklock said:
Bluestore OSD on an SSD works fine... and the SSDs respond correctly to 'hdparm -I'.

I didn't mention it, but I just hope they are on the same HBA.

Knuuut · Oct 23, 2018

Maybe you have to wipe even more than 500GB on your HDDs.
Take a look at this: https://tracker.ceph.com/issues/22354

Cheers Knuuut

nicklock · Oct 23, 2018

Alwin said:
I didn't mention it, but I just hope they are on the same HBA.

Allegedly, the SSDs and HDDs are on the same controller. This will be confirmed when we physically look inside the server

Knuuut said:
Maybe you have to wipe even more than 500GB on your HDDs.
Take a look at this: https://tracker.ceph.com/issues/22354

Cheers Knuuut

Well, I gave it a go:

Code:

root@ceph1m-2:/opt/MegaRAID/perccli# dd if=/dev/zero of=/dev/sde bs=1M count=2048 oflag=direct
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 26.1457 s, 82.1 MB/s

root@ceph1m-2:/# pveceph createosd /dev/sde
create OSD on /dev/sde (bluestore)
Creating new GPT entries.

...

The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

root@ceph1m-2:/# tail -n 5 /var/log/syslog
Oct 23 16:15:20 ceph1m-2 sh[23394]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5687, in main
Oct 23 16:15:20 ceph1m-2 sh[23394]:     args.func(args)
Oct 23 16:15:20 ceph1m-2 sh[23394]:   File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4890, in main_trigger
Oct 23 16:15:20 ceph1m-2 sh[23394]:     raise Error('return code ' + str(ret))
Oct 23 16:15:20 ceph1m-2 sh[23394]: ceph_disk.main.Error: Error: return code 1

No joy.

Stoiko Ivanov · Oct 24, 2018

my guess is, that the cause of the problem is linked to the disks/controllers:

nicklock said:
Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140167] sd 1:0:4:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140170] sd 1:0:4:0: [sde] tag#1 Sense Key : Aborted Command [current] Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140172] sd 1:0:4:0: [sde] tag#1 Add. Sense: Logical block guard check failed Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140174] sd 1:0:4:0: [sde] tag#1 CDB: Read(32) Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140177] sd 1:0:4:0: [sde] tag#1 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140178] sd 1:0:4:0: [sde] tag#1 CDB[10]: 37 e4 1d 80 37 e4 1d 80 00 00 00 00 00 00 00 80 Oct 22 17:14:46 ceph1m-2 kernel: [ 4152.140180] print_req_error: protection error, dev sde, sector 7501573120 Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235159] sd 1:0:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235165] sd 1:0:4:0: [sde] tag#0 Sense Key : Aborted Command [current] Oct 22 17:14:47 ceph1m-2 kernel: [ 4152.235168] sd 1:0:4:0: [sde] tag#0 Add. Sense: Logical block guard check failed

nicklock · Oct 24, 2018

Results of physical investigation:

The HDDs and SSDs are all connected to the same Storage Backplane in the server, which is linked back to the onboard controller (the HBA330 Mini) via two cables with "unusual" cables on each end. Therefore the HDDs and SSDs are all connected to the same controller.
We are unable to connect the HDDs or the server Backplane to another controller, as we have no other SAS capable controllers (and we've never seen that particular connector style before!).

One of our team has had a look at the traceback, and is of the opinion that this is a bug in Ceph - which is not gracefully handling an error situation. On running the ceph-disk command by hand, we can duplicate the crash, which I will append here as it's a little easier to read than the syslog version:

Code:

root@ceph1m-2:/dev# ceph-disk --verbose activate-block /dev/sde2
get_dm_uuid: get_dm_uuid /dev/sde2 uuid path is /sys/dev/block/8:66/dm/uuid
command: Running command: /sbin/blkid -o udev -p /dev/sde2
command: Running command: /usr/bin/ceph-osd --get-device-fsid /dev/sde2
get_space_osd_uuid: Block /dev/sde2 has OSD UUID 7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3
command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
mount: Mounting /dev/disk/by-partuuid/7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3 on /var/lib/ceph/tmp/mnt.7IVFGu with options noatime,inode64
command_check_call: Running command: /bin/mount -t xfs -o noatime,inode64 -- /dev/disk/by-partuuid/7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3 /var/lib/ceph/tmp/mnt.7IVFGu
activate: Cluster uuid is ecf4285f-7a04-4f97-b705-d0194254d317
command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
activate: Cluster name is ceph
activate: OSD uuid is 7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3
activate: OSD id is 0
activate: Initializing OSD...
command_check_call: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/tmp/mnt.7IVFGu/activate.monmap
got monmap epoch 3
command_check_call: Running command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 0 --monmap /var/lib/ceph/tmp/mnt.7IVFGu/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.7IVFGu --osd-uuid 7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3 --setuser ceph --setgroup ceph
/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7ffaf7389e00 time 2018-10-24 09:30:16.412201
/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)
 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x562fe14beab2]
 2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x562fe1425aba]
 3: (BlueFS::_replay(bool)+0x22d) [0x562fe142d34d]
 4: (BlueFS::mount()+0x1e1) [0x562fe1431641]
 5: (BlueStore::_open_db(bool)+0x1698) [0x562fe133f5a8]
 6: (BlueStore::mkfs()+0xeb5) [0x562fe1379a55]
 7: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x562fe0ea8796]
 8: (main()+0x127c) [0x562fe0ddbe2c]
 9: (__libc_start_main()+0xf1) [0x7ffaf39452e1]
 10: (_start()+0x2a) [0x562fe0e6884a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2018-10-24 09:30:16.415218 7ffaf7389e00 -1 /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7ffaf7389e00 time 2018-10-24 09:30:16.412201
/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)

 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x562fe14beab2]
 2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x562fe1425aba]
 3: (BlueFS::_replay(bool)+0x22d) [0x562fe142d34d]
 4: (BlueFS::mount()+0x1e1) [0x562fe1431641]
 5: (BlueStore::_open_db(bool)+0x1698) [0x562fe133f5a8]
 6: (BlueStore::mkfs()+0xeb5) [0x562fe1379a55]
 7: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x562fe0ea8796]
 8: (main()+0x127c) [0x562fe0ddbe2c]
 9: (__libc_start_main()+0xf1) [0x7ffaf39452e1]
 10: (_start()+0x2a) [0x562fe0e6884a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2018-10-24 09:30:16.415218 7ffaf7389e00 -1 /mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7ffaf7389e00 time 2018-10-24 09:30:16.412201
/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)

 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x562fe14beab2]
 2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x562fe1425aba]
 3: (BlueFS::_replay(bool)+0x22d) [0x562fe142d34d]
 4: (BlueFS::mount()+0x1e1) [0x562fe1431641]
 5: (BlueStore::_open_db(bool)+0x1698) [0x562fe133f5a8]
 6: (BlueStore::mkfs()+0xeb5) [0x562fe1379a55]
 7: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x562fe0ea8796]
 8: (main()+0x127c) [0x562fe0ddbe2c]
 9: (__libc_start_main()+0xf1) [0x7ffaf39452e1]
 10: (_start()+0x2a) [0x562fe0e6884a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

*** Caught signal (Aborted) **
 in thread 7ffaf7389e00 thread_name:ceph-osd
 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
 1: (()+0xa3bba4) [0x562fe1476ba4]
 2: (()+0x110c0) [0x7ffaf49900c0]
 3: (gsignal()+0xcf) [0x7ffaf3957fff]
 4: (abort()+0x16a) [0x7ffaf395942a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x562fe14bec3e]
 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x562fe1425aba]
 7: (BlueFS::_replay(bool)+0x22d) [0x562fe142d34d]
 8: (BlueFS::mount()+0x1e1) [0x562fe1431641]
 9: (BlueStore::_open_db(bool)+0x1698) [0x562fe133f5a8]
 10: (BlueStore::mkfs()+0xeb5) [0x562fe1379a55]
 11: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x562fe0ea8796]
 12: (main()+0x127c) [0x562fe0ddbe2c]
 13: (__libc_start_main()+0xf1) [0x7ffaf39452e1]
 14: (_start()+0x2a) [0x562fe0e6884a]
2018-10-24 09:30:16.418342 7ffaf7389e00 -1 *** Caught signal (Aborted) **
 in thread 7ffaf7389e00 thread_name:ceph-osd

 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
 1: (()+0xa3bba4) [0x562fe1476ba4]
 2: (()+0x110c0) [0x7ffaf49900c0]
 3: (gsignal()+0xcf) [0x7ffaf3957fff]
 4: (abort()+0x16a) [0x7ffaf395942a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x562fe14bec3e]
 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x562fe1425aba]
 7: (BlueFS::_replay(bool)+0x22d) [0x562fe142d34d]
 8: (BlueFS::mount()+0x1e1) [0x562fe1431641]
 9: (BlueStore::_open_db(bool)+0x1698) [0x562fe133f5a8]
 10: (BlueStore::mkfs()+0xeb5) [0x562fe1379a55]
 11: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x562fe0ea8796]
 12: (main()+0x127c) [0x562fe0ddbe2c]
 13: (__libc_start_main()+0xf1) [0x7ffaf39452e1]
 14: (_start()+0x2a) [0x562fe0e6884a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2018-10-24 09:30:16.418342 7ffaf7389e00 -1 *** Caught signal (Aborted) **
 in thread 7ffaf7389e00 thread_name:ceph-osd

 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
 1: (()+0xa3bba4) [0x562fe1476ba4]
 2: (()+0x110c0) [0x7ffaf49900c0]
 3: (gsignal()+0xcf) [0x7ffaf3957fff]
 4: (abort()+0x16a) [0x7ffaf395942a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x562fe14bec3e]
 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xf7a) [0x562fe1425aba]
 7: (BlueFS::_replay(bool)+0x22d) [0x562fe142d34d]
 8: (BlueFS::mount()+0x1e1) [0x562fe1431641]
 9: (BlueStore::_open_db(bool)+0x1698) [0x562fe133f5a8]
 10: (BlueStore::mkfs()+0xeb5) [0x562fe1379a55]
 11: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x346) [0x562fe0ea8796]
 12: (main()+0x127c) [0x562fe0ddbe2c]
 13: (__libc_start_main()+0xf1) [0x7ffaf39452e1]
 14: (_start()+0x2a) [0x562fe0e6884a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

mount_activate: Failed to activate
unmount: Unmounting /var/lib/ceph/tmp/mnt.7IVFGu
command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.7IVFGu
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 11, in <module>
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5736, in run
    main(sys.argv[1:])
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5687, in main
    args.func(args)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5437, in <lambda>
    func=lambda args: main_activate_space(name, args),
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4178, in main_activate_space
    reactivate=args.reactivate,
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3540, in mount_activate
    (osd_id, cluster) = activate(path, activate_key_template, init)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3717, in activate
    keyring=keyring,
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3169, in mkfs
    '--setgroup', get_ceph_group(),
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 566, in command_check_call
    return subprocess.check_call(arguments)
  File "/usr/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'0', '--monmap', '/var/lib/ceph/tmp/mnt.7IVFGu/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.7IVFGu', '--osd-uuid', u'7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status -6

So the question can now be narrowed to "why does ceph-osd crash at this point?":

Code:

command_check_call: Running command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 0 --monmap /var/lib/ceph/tmp/mnt.7IVFGu/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.7IVFGu --osd-uuid 7df1b0b6-b513-4e17-b7b0-dde0e28ea7c3 --setuser ceph --setgroup ceph
/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7ffaf7389e00 time 2018-10-24 09:30:16.412201
/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)

nicklock · Oct 24, 2018

We've stepped through the commands shown in the logs by hand, and managed to generate an strace of the ceph-osd command that fails (attached). This bit looks interesting:

Code:

...
madvise(0x55aaa41de000, 1990656, MADV_FREE) = 0
pread64(20, 0x55aaa41de000, 1048576, 3840699006976) = -1 EILSEQ (Invalid or incomplete multibyte or wide character)   <-----*******
madvise(0x55aaa41de000, 1990656, MADV_FREE) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1540373661, tv_nsec=278991476}) = 0
write(2, "/mnt/npool/a.antreich/ceph/ceph-"..., 337/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7f22f2483e00 time 2018-10-24 10:34:21.278991
/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/os/bluestore/BlueFS.cc: 976: FAILED assert(r == 0)
...

Alwin · Oct 25, 2018

So we know the following combinations:

HBA330 mini -> SSD -> OSD - OK
HBA330 mini -> HDD -> OSD - NOT OK

To rule out the HDDs, is it possible to use them on a system where you know that the HBA -> HDD combination works? Because then it is narrowed down to the HBA or the HDDs.

nicklock · Oct 25, 2018

Hello Alwin,

While this is a logical step, it would be difficult for us to attempt this due to the drives being SAS. All previous drives we have used for Ceph have been SATA. (it's only because Dell did not offer SATA 8TB drives in this platform that we have ended up with SAS here!)

Trying to use the drives in our existing Ceph clusters, which are SuperMicro chassis/motherboard platforms, would be "experimental" at best - and as the storage servers are already fully populated with SATA HDDs would mean deliberately degrading a production Ceph cluster to plug one in!

I am currently finding out if I can borrow hardware from somewhere (outside our business) to use for testing

Regards,
Nick.

nicklock · Oct 26, 2018

Hello Alwin,

I have 4 of the drives back at our office, but am unable to connect them to any of our spare HBA cards as the drives appear to use SFF-8482 connectors. We have therefore ordered the correct cable set, and I will be attempting to test the drives when this arrives next week.

Regards,
Nick.

nicklock · Nov 1, 2018

Hello Alwin,

Unfortunately we are unable to get the drives to function at all in the office. I am assuming that the HBA/cables we have are unable to correctly initialise the drives, and they do not spin up... (does SAS require a signal saying "ok, you can fully power on now"??)

So we are stuck, and cannot progress with testing. We are unlikely to find the reason for the incompatibility in the near future - so we are going to go back to installing Ceph via Filestore OSDs. As a result of this, we will probably be unable to do further (destructive) investigation into the issue with Bluestore OSDs as the hardware will be in production... sorry

.

Thank you for your assistance so far - if we can think of anything else that is relevant to the problem, or come across a solution ourselves, I will update you.

Best Regards,
Nick.

Alwin · Nov 2, 2018

nicklock said:
Unfortunately we are unable to get the drives to function at all in the office. I am assuming that the HBA/cables we have are unable to correctly initialise the drives, and they do not spin up... (does SAS require a signal saying "ok, you can fully power on now"??)

The HBA may not deliver enough power to the drive.

nicklock said:
So we are stuck, and cannot progress with testing. We are unlikely to find the reason for the incompatibility in the near future - so we are going to go back to installing Ceph via Filestore OSDs. As a result of this, we will probably be unable to do further (destructive) investigation into the issue with Bluestore OSDs as the hardware will be in production... sorry .

Understandable. If you create the OSDs by hand, you could try with ceph-volume (default in Mimic) instead of ceph-disk. Ceph-volume is using LVM and no extra partition for its meta information.

[Ceph] unable to run OSDs

New Member

Proxmox Retired Staff

New Member

Proxmox Retired Staff

New Member

Proxmox Retired Staff

New Member

Proxmox Retired Staff

New Member

Proxmox Retired Staff

Member

New Member

Proxmox Staff Member

New Member

New Member

Attachments

Proxmox Retired Staff

New Member

New Member

New Member

Proxmox Retired Staff