Move Ceph journal to SSD?

Waschbüsch · Aug 13, 2015

Hi all,

I was thinking about adding a (server class) SSD each to my three ceph nodes.
Currently, each node has two OSDs and the journal for each is on the drive itself.
Now, I have a few questions about concepts:

- Can the two journals reside on the same partition on the SSD?
- Or do I have to provide one partition per OSD?
- Is there a way to migrate the journal or do I have to re-create the OSD?
- Are there any other ceph-related options / configuration items that would need to be changed to accommodate the new setup?

Finally, I was leaning towards an Intel S3700 Series SSD for durability. Anyone has experience with those or other recommendations?
How about size? Can I make do with the 100G model or is more space necessary?

Thanks,

Martin

udo · Aug 13, 2015

Re: Move Ceph jounral to SSD?

Waschbüsch said:
Hi all,

I was thinking about adding a (server class) SSD each to my three ceph nodes.
Currently, each node has two OSDs and the journal for each is on the drive itself.
Now, I have a few questions about concepts:

- Can the two journals reside on the same partition on the SSD?
- Or do I have to provide one partition per OSD?
- Is there a way to migrate the journal or do I have to re-create the OSD?
- Are there any other ceph-related options / configuration items that would need to be changed to accommodate the new setup?

Finally, I was leaning towards an Intel S3700 Series SSD for durability. Anyone has experience with those or other recommendations?
How about size? Can I make do with the 100G model or is more space necessary?

Thanks,

Martin

Hi Martin,
you can use multible (filebased) jounalfiles on one SSD-Partition - e.g. a mounted Filesystem. But this is not realy the best way (partition based journal on a ssd are much faster).

I started with "normal" SSDs and filebased journals and use now one 200GB Intel DC S3700 for 6 HDDs. In my case I use an 11GB-Partition for each OSD - the extra space I use for an SSD-OSD as cache-pool (EC).

You can switch the journal during normal operation - like this (for instance with osd.2 and 15GB journal partition on a ssd)

Code:

# set noout - so that no data migration start
ceph osd set noout

/etc/init.d/ceph stop osd.2

# wait a little bit to finish the stop and flush the cache IMPORTANT!!
ceph-osd -i 2 --flush-journal

# umount for test only, that no process use the osd
umount /var/lib/ceph/osd/ceph-2

# create partitions like this way
parted /dev/SSD
mkpart journal-2 1 15G
mkpart journal-3 15G 30G
mkpart journal-4 30G 45G
...

# change ceph.conf so that the new journal-path is used
[osd.2]
host = ceph-01
public_addr = 192.168.2.11
cluster_addr = 192.168.3.11
osd_journal = /dev/disk/by-partlabel/journal-2

# init the journal
ceph-osd -i 2 --mkjournal

#start osd (perhaps you must mount ceph-2 before)
/etc/init.d/ceph start osd.2

If you switched all journals, you can also set this entry in ceph.conf as globel entry:

Code:

osd_journal = /dev/disk/by-partlabel/journal-$id

I have done this many times without trouble.

Udo

udo · Aug 13, 2015

Re: Move Ceph jounral to SSD?

Hi,
after that you should unset noout:

Code:

ceph osd unset noout

BTW. you can also do some magic with sgdisk to set the right typecode and partition-guid for the journal disks, but this is not realy neccessary (work for me also without that)

Udo

Waschbüsch · Aug 16, 2015

Re: Move Ceph jounral to SSD?

Thank you, Udo, for the great and detailed reply.
I have ordered my S3700 SSDs and will implement this as you suggested. I'll give feedback once that's done.
It might take a bit because I will have to wait for the right trays so I can put the SSDs in my Supermicro servers instead of the DVD-drive.

Stay tuned.

Martin

Waschbüsch · Aug 18, 2015

Re: Move Ceph jounral to SSD?

Here's an additional thought:
The drives are all attached to an Adaptec 8805 SAS Controller capable of using SSDs for caching.
Any idea on how that would compare to putting only the journal on the SSD?

Martin

udo · Aug 23, 2015

Re: Move Ceph jounral to SSD?

Waschbüsch said:
Here's an additional thought:
The drives are all attached to an Adaptec 8805 SAS Controller capable of using SSDs for caching.
Any idea on how that would compare to putting only the journal on the SSD?

Martin

Hi Martin,
in this case the cache-ssd (Adaptec) speed up both writes (first journal and second osd) which is not nessesary - only the speedup of the journal-write should matter (double writes of the cache-ssd).
And the Controller-SSD is also/mainly? for read-caching?! Iy you benefit from this, depends strong from your workload (multible reads of the same VM-HDD-areas, where the content isn't in the ram (buffer)).

Which scenario is faster in real life you must try.

Udo

Waschbüsch · Aug 28, 2015

Re: Move Ceph jounral to SSD?

Hi Udo,

This worked just as advertised. ;-)
Thanks again for your help!

Martin

Donnie · Sep 16, 2017

Hello,
i have proxmox 4.4 and i would like to add a partitioned SSD to use it as a journal
I wanted to follow this guide, but I can not find how to change the osd configuration in /etc/pve/ceph.conf

Code:

.....

# change ceph.conf so that the new journal-path is used
[osd.2]
host = ceph-01
public_addr = 192.168.2.11
cluster_addr = 192.168.3.11
osd_journal = /dev/disk/by-partlabel/journal-2

.....

Donatello

udo · Sep 16, 2017

Donnie said:
Hello,
i have proxmox 4.4 and i would like to add a partitioned SSD to use it as a journal
I wanted to follow this guide, but I can not find how to change the osd configuration in /etc/pve/ceph.conf

Code:

..... # change ceph.conf so that the new journal-path is used [osd.2] host = ceph-01 public_addr = 192.168.2.11 cluster_addr = 192.168.3.11 osd_journal = /dev/disk/by-partlabel/journal-2 .....

Donatello

Hi,
with vi?

Udo

Donnie · Sep 16, 2017

My /etc/pve/ceph.conf:

Code:

[global]
    auth client required = cephx
    auth cluster required = cephx
    auth service required = cephx
    cluster network = 10.10.10.0/24
    filestore xattr use omap = true
    fsid = 08f99da6-2cc6-496c-a204-d15177cf69a8
    keyring = /etc/pve/priv/$cluster.$name.keyring
    osd journal size = 5120
    osd pool default min size = 1
    public network = 10.10.10.0/24

[osd]
    keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
    host = prx1
    mon addr = 10.10.10.101:6789
[mon.1]
    host = prx2
    mon addr = 10.10.10.102:6789
[mon.2]
    host = prx3
    mon addr = 10.10.10.103:6789

I can not find: [osd.2]

Sorry for my english, use google translate

udo · Sep 16, 2017

right,
entrys for single osds are only nessessary if you want to set different parameter for this osd.

So, if you want special settings for an osd - create the entry.

Udo

Magneto · Nov 6, 2017

How can you see which journal the osd uses?

mbaldini · Nov 6, 2017

SilverNodashi said:
How can you see which journal the osd uses?

ceph-disk list

Magneto · Nov 7, 2017

udo said:
Re: Move Ceph jounral to SSD?

Hi Martin,
you can use multible (filebased) jounalfiles on one SSD-Partition - e.g. a mounted Filesystem. But this is not realy the best way (partition based journal on a ssd are much faster).

I started with "normal" SSDs and filebased journals and use now one 200GB Intel DC S3700 for 6 HDDs. In my case I use an 11GB-Partition for each OSD - the extra space I use for an SSD-OSD as cache-pool (EC).

You can switch the journal during normal operation - like this (for instance with osd.2 and 15GB journal partition on a ssd)

Code:

# set noout - so that no data migration start ceph osd set noout /etc/init.d/ceph stop osd.2 # wait a little bit to finish the stop and flush the cache IMPORTANT!! ceph-osd -i 2 --flush-journal # umount for test only, that no process use the osd umount /var/lib/ceph/osd/ceph-2 # create partitions like this way parted /dev/SSD mkpart journal-2 1 15G mkpart journal-3 15G 30G mkpart journal-4 30G 45G ... # change ceph.conf so that the new journal-path is used [osd.2] host = ceph-01 public_addr = 192.168.2.11 cluster_addr = 192.168.3.11 osd_journal = /dev/disk/by-partlabel/journal-2 # init the journal ceph-osd -i 2 --mkjournal #start osd (perhaps you must mount ceph-2 before) /etc/init.d/ceph start osd.2

If you switched all journals, you can also set this entry in ceph.conf as globel entry:

Code:

osd_journal = /dev/disk/by-partlabel/journal-$id

I have done this many times without trouble.

Udo

Hi,

I have tried your way, but it fails as follow:

root@srv1:~# /etc/init.d/ceph stop osd.0
[ ok ] Stopping ceph (via systemctl): ceph.service.
root@srv1:~# ceph-osd -i 0 --flush-journal
2017-11-07 06:33:16.536487 7f690bdfce00 -1 asok(0x564e41f8b2c0) AdminSocketConfigOb s::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socke t to '/var/run/ceph/ceph-osd.0.asok': (17) File exists
2017-11-07 06:33:16.536626 7f690bdfce00 -1 bluestore(/var/lib/ceph/osd/ceph-0) _loc k_fsid failed to lock /var/lib/ceph/osd/ceph-0/fsid (is another ceph-osd still runn ing?)(11) Resource temporarily unavailable
2017-11-07 06:33:16.536652 7f690bdfce00 -1 ** ERROR: error flushing journal /var/l ib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0: (11) Resource temporarily unavailable

So I manually stopped the OSD in the Proxmox interface and could flush it:

root@srv1:~# ceph-osd -i 0 --flush-journal
2017-11-07 06:33:37.402772 7fae688c5e00 -1 flushed journal /var/lib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0

Then later on I got the following error:

root@srv1:~# ceph-osd -i 0 --mkjournal
2017-11-07 06:37:45.234444 7f7c156c8e00 -1 asok(0x5639ad5ab2c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.0.asok': (17) File exists
2017-11-07 06:37:45.234473 7f7c156c8e00 -1 created new journal /dev/disk/by-partlabel/journal-2 for object store /var/lib/ceph/osd/ceph-0

So I manually deleted "/var/run/ceph/ceph-osd.0.asok" and tried again. Not sure if I should have?

root@srv1:~# rm /var/run/ceph/ceph-osd.0.asok
root@srv1:~# ceph-osd -i 0 --mkjournal
2017-11-07 06:38:43.892215 7f1415991e00 -1 created new journal /dev/disk/by-partlabel/journal-2 for object store /var/lib/ceph/osd/ceph-0

But at the end of the day I'm still sitting with a 1GB OSD Journal:

root@srv1:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 149.1G 0 disk
├─sda1 8:1 0 100M 0 part /var/lib/ceph/osd/ceph-0
├─sda2 8:2 0 149G 0 part
└─sda3 8:3 0 512B 0 part
sdb 8:16 0 149.1G 0 disk
└─sdb1 8:17 0 1G 0 part
sdc 8:32 1 29.3G 0 disk
├─sdc1 8:33 1 1M 0 part
├─sdc2 8:34 1 256M 0 part
└─sdc3 8:35 1 29.1G 0 part
├─pve-swap 253:0 0 3.6G 0 lvm [SWAP]
├─pve-root 253:1 0 7.3G 0 lvm /
├─pve-data_tmeta 253:2 0 16M 0 lvm
│ └─pve-data 253:4 0 14.6G 0 lvm
└─pve-data_tdata 253:3 0 14.6G 0 lvm
└─pve-data 253:4 0 14.6G 0 lvm
sr0 11:0 1 432.3M 0 rom

I basically want to increase the journal side to say 100GB.

P.S. This is a test machine so the journal is on a SATA drive as well. The actual server I use has a SSD for the jounal. My problem is that the journal is only 1GB.

udo · Nov 7, 2017

SilverNodashi said:
Hi,

I have tried your way, but it fails as follow:

Hi,
looks not that you really do the same.

So I manually stopped the OSD in the Proxmox interface and could flush it:

you can only flush an journal if the osd is stopped!
During the time (ceph-version), the kind to stop an osd has changed...

So I manually deleted "/var/run/ceph/ceph-osd.0.asok" and tried again. Not sure if I should have?

An really bad and dangerous idea! An existing asock mean the osd-procss is running!! So I would not expect that the commands work!

I basically want to increase the journal side to say 100GB.

an 100GB journal don't make sense!!
10 GB should be fair enough - perhapsa little bit more (like 12GB) but not so much. The system must be able to sync the journal to an OSD in an normal time (and do this, expect you use crazy values in your ceph.conf).

Udo

Magneto · Nov 7, 2017

udo said:
Hi,
looks not that you really do the same.

Did you see the commands I posted? I used your exact commands, but you just confirmed that due to the version change, the commands might have changed.

udo said:
you can only flush an journal if the osd is stopped!
During the time (ceph-version), the kind to stop an osd has changed...

One of the problems I am facing is that you can't stop the OSD from the command line, so I had to stop it from the Proxmox interface

Code:

root@virt1:~# ceph osd set noout
noout is set
root@virt1:~# /etc/init.d/ceph stop osd.0
[ ok ] Stopping ceph (via systemctl): ceph.service.
root@virt1:~# ps ax | grep osd
 2402 ?        Ssl   22:04 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
 2525 ?        Ssl   22:23 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
 2837 ?        Ssl   24:49 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
14887 pts/0    S+     0:00 grep osd

udo said:
An really bad and dangerous idea! An existing asock mean the osd-procss is running!! So I would not expect that the commands work!

Yes. But even if the OSD is stopped, that file still exists. So, what now?

udo said:
an 100GB journal don't make sense!!
10 GB should be fair enough - perhapsa little bit more (like 12GB) but not so much. The system must be able to sync the journal to an OSD in an normal time (and do this, expect you use crazy values in your ceph.conf).

Udo

Why is 100GB such a bad a idea? I have a 400GB SSD to work with and want to optimize the hardware I have as much as possible.

The setup is as follows:

3x Supermicro server with following features per server:
128GB RAM
3x 8TB SATA HDD
2x SSD drives (intel_ssdsc2ba400g4 - 400GB DC S3710)
2x 12 core CPU (Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Quad port 10Gbe Inter NIC
2x 10GB Cisco switches (to isolate storage network from LAN)

root@virt1:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 65.50461 root default
-3 21.83487 host virt1
0 hdd 7.27829 osd.0 up 1.00000 1.00000
1 hdd 7.27829 osd.1 up 1.00000 1.00000
2 hdd 7.27829 osd.2 up 1.00000 1.00000
-5 21.83487 host virt2
3 hdd 7.27829 osd.3 up 1.00000 1.00000
4 hdd 7.27829 osd.4 up 1.00000 1.00000
5 hdd 7.27829 osd.5 up 1.00000 1.00000
-7 21.83487 host virt3
6 hdd 7.27829 osd.6 up 1.00000 1.00000
7 hdd 7.27829 osd.7 up 1.00000 1.00000
8 hdd 7.27829 osd.8 up 1.00000 1.00000

tommytom · Jan 18, 2018

I have a Cluster with 8 Server and 88 OSD's
everything went well and i decided to add Intel 3700 PCIe SSD's to each server for journaling and an dedicated SSD Pool with the rest of the storage. The Partitions of the flashcards on all servers look like:

Code:

oot@srv4332:~# parted /dev/nvme0n1
GNU Parted 3.2
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: Unknown (unknown)
Disk /dev/nvme0n1: 800GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name        Flags
 1      1049kB  15.0GB  15.0GB               journal-55
 2      15.0GB  30.0GB  15.0GB               journal-56
 3      30.0GB  45.0GB  15.0GB               journal-57
 4      45.0GB  60.0GB  15.0GB               journal-58
 5      60.0GB  75.0GB  15.0GB               journal-59
 6      75.0GB  90.0GB  15.0GB               journal-60
 7      90.0GB  105GB   15.0GB               journal-61
 8      105GB   120GB   15.0GB               journal-62
 9      120GB   135GB   15.0GB               journal-63
10      135GB   150GB   15.0GB               journal-64
11      150GB   165GB   15.0GB               journal-65
12      165GB   800GB   635GB                SSDPool

because of 88 Disks i wrote a little script which do the changes STEP-by-STEP:

Code:

#! /bin/sh
ID=$1
if [ "x$ID" = "x" ]; then
        echo usage: fj.sh ID
        exit
fi

FILE=/etc/pve/ceph.conf

echo Adding Journal $ID config to $FILE
HOST=`hostname`
IP=`hostname --ip-address`
echo "[osd.$ID]" >> $FILE
echo "  host = $HOST" >> $FILE
echo "  public_addr = $IP" >> $FILE
echo "  cluster_addr = $IP" >> $FILE
echo "  osd_journal = /dev/disk/by-partlabel/journal-$ID" >> $FILE


echo Moving Journal to SSD for DEVICE $ID
DEV=`findmnt -n -o SOURCE --target /var/lib/ceph/osd/ceph-$ID`
echo DeviceName: $DEV
echo stop ceph-osd@$ID
systemctl stop ceph-osd@$ID
echo flushing journal for ceph-osd@$ID
ceph-osd -i $ID --flush-journal
echo ReMounting Device $DEV
umount /var/lib/ceph/osd/ceph-$ID
mount $DEV /var/lib/ceph/osd/ceph-$ID
echo making the journal for $ID
ceph-osd -i $ID --mkjournal
echo finally starting osd $ID again
systemctl start ceph-osd@$ID

everything went fine.. even with the script it took about 2 hours for those 88 disks

but if i do

ceph-disk list

to check if it is working i get this:

Code:

/dev/nvme0n1 :
 /dev/nvme0n1p1 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p10 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p11 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p12 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p2 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p3 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p4 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p5 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p6 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p7 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p8 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
 /dev/nvme0n1p9 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
/dev/sda :
 /dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
 /dev/sda2 other, vfat, mounted on /boot/efi
 /dev/sda3 other, LVM2_member
/dev/sdb :
 /dev/sdb1 ceph data, active, cluster ceph, osd.55, block /dev/sdb2
 /dev/sdb2 ceph block, for /dev/sdb1
/dev/sdc :
 /dev/sdc1 ceph data, active, cluster ceph, osd.56, block /dev/sdc2
 /dev/sdc2 ceph block, for /dev/sdc1
/dev/sdd :
 /dev/sdd1 ceph data, active, cluster ceph, osd.57, block /dev/sdd2
 /dev/sdd2 ceph block, for /dev/sdd1
/dev/sde :
 /dev/sde1 ceph data, active, cluster ceph, osd.58, block /dev/sde2
 /dev/sde2 ceph block, for /dev/sde1
/dev/sdf :
 /dev/sdf1 ceph data, active, cluster ceph, osd.59, block /dev/sdf2
 /dev/sdf2 ceph block, for /dev/sdf1
/dev/sdg :
 /dev/sdg1 ceph data, active, cluster ceph, osd.60, block /dev/sdg2
 /dev/sdg2 ceph block, for /dev/sdg1
/dev/sdh :
 /dev/sdh1 ceph data, active, cluster ceph, osd.61, block /dev/sdh2
 /dev/sdh2 ceph block, for /dev/sdh1
/dev/sdi :
 /dev/sdi1 ceph data, active, cluster ceph, osd.62, block /dev/sdi2
 /dev/sdi2 ceph block, for /dev/sdi1
/dev/sdj :
 /dev/sdj1 ceph data, active, cluster ceph, osd.63, block /dev/sdj2
 /dev/sdj2 ceph block, for /dev/sdj1
/dev/sdk :
 /dev/sdk1 ceph data, active, cluster ceph, osd.64, block /dev/sdk2
 /dev/sdk2 ceph block, for /dev/sdk1
/dev/sdl :
 /dev/sdl1 ceph data, active, cluster ceph, osd.65, block /dev/sdl2
 /dev/sdl2 ceph block, for /dev/sdl1

i cannot see the journals anywhere and i don't really know if they are working. The Cluster speed when installing a new VM still seems to be disk-slow somehow...

2nd also i would like to know how to add the LAST partion (SSDPool) to a seperate SSD only pool. ceph-deploy is not installed on proxmox. i read it is not supported.. do you know how to archieve this ?

Help would be highly appreciated

udo · Jan 19, 2018

tommytom said:
but if i do

ceph-disk list
to check if it is working i get this:

Code:

/dev/nvme0n1 : /dev/nvme0n1p1 other, 0fc63daf-8483-4772-8e79-3d69d8477de4 /dev/nvme0n1p10 other, 0fc63daf-8483-4772-8e79-3d69d8477de4 /dev/nvme0n1p11 other, 0fc63daf-8483-4772-8e79-3d69d8477de4

Hi,
the typecode of the journal-partiton are not the typecode from an ceph-journal!

Code:

sgdisk -t $PARTNUMBER:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/nvme0n1

should set the right typecode.

with "iostat -dm 5 /dev/nvme0n1" can you look if the disk are used for writing.

Udo

tommytom · Jan 19, 2018

hello Udo,

is $PARTNUMBER the serial of the SSD ?

lspci -vvv -d 8086:0953 | grep "Device Serial Number"

lspci -vvv -d 8086:0953

Code:

lspci -vvv -d 8086:0953
07:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
    Subsystem: Intel Corporation DC P3700 SSD
    Physical Slot: 2
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 26
    NUMA node: 0
    Region 0: Memory at f7ff0000 (64-bit, non-prefetchable) [size=16K]
    [virtual] Expansion ROM at f7f00000 [disabled] [size=64K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI-X: Enable+ Count=32 Masked-
        Vector table: BAR=0 offset=00002000
        PBA: BAR=0 offset=00003000
    Capabilities: [60] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <4us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
        DevCtl:    Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
            MaxPayload 256 bytes, MaxReadReq 4096 bytes
        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <4us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 8GT/s, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
             EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
        UESvrt:    DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
        AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    Capabilities: [150 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
            Status:    NegoPending- InProgress-
    Capabilities: [180 v1] Power Budgeting <?>
    Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
        ARICap:    MFVC- ACS-, Next Function: 0
        ARICtl:    MFVC- ACS-, Function Group: 0
    Capabilities: [270 v1] Device Serial Number 55-cd-2e-41-4d-4d-4a-5d
    Capabilities: [2a0 v1] #19
    Kernel driver in use: nvme

found out, that the way to move the journal to nvme/ssd was for filestore not bluestore drives..
it doesn't did anything to it.
i guess i have to destroy / re-create those drives like i did when i moved over from filestore to bluestore in the first.

also i would like to know, how big those partitions should be... for wal.db and block.db. I didn't find the right answer for this question anywhere.

also could someone explain how i can manage to have a SSD-Pool from the remaining space

udo · Jan 19, 2018

tommytom said:
hello Udo,

is $PARTNUMBER the serial of the SSD ?

No, it's mean Partition number (1,2,3,4...)

found out, that the way to move the journal to nvme/ssd was for filestore not bluestore drives..
it doesn't did anything to it.

yes - if you use bluestore! AFAIK it's not possible to move an bluestore-journal (you must recreate the whole OSD.

Udo

Move Ceph journal to SSD?

Renowned Member

Distinguished Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

New Member

Distinguished Member

New Member

Distinguished Member

Well-Known Member

Well-Known Member

Well-Known Member

Distinguished Member

Well-Known Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member