can't create journal device for CEPH

athompso

Renowned Member
Sep 13, 2013
129
8
83
I've got two 10GB LUNs and a 1.1TB LUN exposed to my server as /dev/sda, /dev/sdb and /dev/sdc.
PVE 3.4 was installed to /dev/sda.
I want to create a CEPH OSD on /dev/sdc with /dev/sdb as the journal, but I'm unable to do so.

Firstly, the GUI only offers to let me use /dev/sda as a journal device (!). This isn't what I want - that would completely screw things up!

The command-line pveceph tool has a different problem, I see this:

root@pvetemp:~# pveceph createosd /dev/sdc -journal_dev /dev/sdb
create OSD on /dev/sdc (xfs)
using device '/dev/sdb' for journal
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
The operation has completed successfully.
Error: /dev/sdb: unrecognised disk label
ceph-disk: Error: weird parted units:
command 'ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid 57ee93f9-e7bc-4aba-a103-a1fdf35db5ac --journal-dev /dev/sdc /dev/sdb' failed: exit code 1
What on earth? Why do I need a disk label on /dev/sdb, when I'm about to use the raw device as a journal?

OK, so I use gdisk to zap the MBR & GPT labels, then parted to mklabel a new, empty, GPT label on /dev/sdb.
Then pveceph gets happier, but with some scary warnings:

root@pvetemp:~# gdisk /dev/sdb
GPT fdisk (gdisk) version 0.8.5


Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: not present


Creating new GPT entries.


Command (? for help): x


Expert command (? for help): z
About to wipe out GPT on /dev/sdb. Proceed? (Y/N): y
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Blank out MBR? (Y/N): y
root@pvetemp:~# parted /dev/sdb
GNU Parted 3.2
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt
(parted)
Information: You may need to update /etc/fstab.


root@pvetemp:~# pveceph createosd /dev/sdc -journal_dev /dev/sdb
create OSD on /dev/sdc (xfs)
using device '/dev/sdb' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.


****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
The operation has completed successfully.
Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
The operation has completed successfully.


meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=74792895 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=299171579, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=146079, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
The operation has completed successfully.
Um. In the Disks tab of the PVE GUI, sdc shows as "osd.0", but sdb shows as "partitions", not "journal" or anything like that.
Have I successfully got a journaled OSD now? "ceph osd tree" doesn't show the journal, but I'm not sure if it should or not.

Thanks,
-Adam Thompson
athompso@athompso.net
 
It normally shows partitions in the PVE interface when it's being used as a journal, so that is fine. I can't answer why it didn't work before, but it appears to be working now as it should (it may have required a base partition table? I don't remember having to do that on my wiped disks, but maybe I did and just don't remember).

I hope the LUN you have for the journal is fast, if it's slow, you probably won't be happy with performance since it has to write to the journal prior to writing to the OSD itself.
 
root@pvetemp:~# pveceph createosd /dev/sdc -journal_dev /dev/sdb
create OSD on /dev/sdc (xfs)
using device '/dev/sdb' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
Hi,
creating an partition table makes only sense if you don't overwrite them with the second step!

Simply create prtitions - like sdb1, sdb2,... - and use them as journal.
Code:
pveceph createosd /dev/sdc -journal_dev /dev/sdb1
Normaly you take an fast SSD as journal-device for more than one OSD (app. 4 - 6). The journal must also not too big (I use 15GB for each partition).

Udo
 
Hi,
creating an partition table makes only sense if you don't overwrite them with the second step!

Simply create prtitions - like sdb1, sdb2,... - and use them as journal.
Code:
pveceph createosd /dev/sdc -journal_dev /dev/sdb1

Actually the pveceph command doesn't seem to like if you pass it an actual partition (ie: it will work, but it seems to create issues) - it will add a partition to the drive specified for the journal requested... When I first did it via command line, I had created partitions manually, but the whole thing got screwed up. The command will create a default partition size (5 gig, which is kinda small) for the journal. If you set the size of the journal desired in ceph.conf, it will create larger partitions based on that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!