Ceph OSD with file journal

gcakici

Renowned Member
Sep 26, 2009
46
1
73
I've an Intel P3700DC NVMe drive which I want to use it as journal for 3 SATA OSDs. I partitioned the device but can not use it them as block devices so I want to use it as file journals.

I couldn't find a way to go on Proxmox interface and ceph-disk prepares the OSD but I can not see it in the proxmox interface either. This is a new installed platform with enterprise repo and Jewel is the latest version.

How can I create file journaled OSD's which can be seen and operated in the Proxmox interface?

Thanks
Gokalp

ceph-disk prepare --fs-type xfs --cluster ceph --journal-file /dev/sda /tmp/journal​

prepare_file: OSD will not be hot-swappable if journal is not the same device as the osd data
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sda1 isize=2048 agcount=4, agsize=244188597 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=976754385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=476930, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
The operation has completed successfully.


pveversion -v

proxmox-ve: 4.4-86 (running kernel: 4.4.49-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.49-1-pve: 4.4.49-86
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-97
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
ceph: 10.2.7-1~bpo80+1
 
Hi,

here is the command for using a journal as you like
assume sdb is the disk and nvm00n1p1 is the NVME partition.

pveceph createosd /dev/sda -journal_dev /dev/nvme0n1p1
 
Thank you for the reply. This is exactly what I did for my installation. Before the execution of the command ;

#ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0 root default
-2 0 host prox-ceph-5-1-51

#ceph-disk zap /dev/sda
The operation has completed successfully.

Then I executed the same command as yours.

#pveceph createosd /dev/sda -journal_dev /dev/nvme0n1p1
create OSD on /dev/sda (xfs)
using device '/dev/nvme0n1p1' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
prepare_device: Journal /dev/nvme0n1p1 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sda1 isize=2048 agcount=4, agsize=244188597 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=976754385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=476930, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
The operation has completed successfully.

I can see that it has been prepared and avaliable to ceph.

#ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0 root default
-2 0 host prox-ceph-5-1-51
0 0 osd.0 down 1.00000 1.00000

But I can not see it in the Proxmox interface.

OSD LOG says ;

2017-04-13 10:11:09.713989 7f55ec4e2800 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
2017-04-13 10:11:09.714089 7f55ec4e2800 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.y11kCT: (13) Permission denied
2017-04-13 10:11:10.287998 7fbd65627800 0 set uid:gid to 64045:64045 (ceph:ceph)
2017-04-13 10:11:10.288009 7fbd65627800 0 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185), process ceph-osd, pid 24422
2017-04-13 10:11:10.290130 7fbd65627800 1 filestore(/var/lib/ceph/tmp/mnt.V_F6HH) mkfs in /var/lib/ceph/tmp/mnt.V_F6HH
2017-04-13 10:11:10.290148 7fbd65627800 1 filestore(/var/lib/ceph/tmp/mnt.V_F6HH) mkfs fsid is already set to 995fa999-f78a-4f9a-834e-994f4d49b430
2017-04-13 10:11:10.290152 7fbd65627800 1 filestore(/var/lib/ceph/tmp/mnt.V_F6HH) write_version_stamp 4
2017-04-13 10:11:10.292551 7fbd65627800 0 filestore(/var/lib/ceph/tmp/mnt.V_F6HH) backend xfs (magic 0x58465342)
2017-04-13 10:11:10.422327 7fbd65627800 1 filestore(/var/lib/ceph/tmp/mnt.V_F6HH) leveldb db exists/created
2017-04-13 10:11:10.422432 7fbd65627800 -1 filestore(/var/lib/ceph/tmp/mnt.V_F6HH) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.V_F6HH/journal: (13) Permission denied
2017-04-13 10:11:10.422468 7fbd65627800 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
2017-04-13 10:11:10.422521 7fbd65627800 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.V_F6HH: (13) Permission denied

# ls -hal /var/lib/ceph/
total 20K
drwxr-x--- 8 ceph ceph 8 Apr 11 23:15 .
drwxr-xr-x 45 root root 46 Apr 11 23:15 ..
drwxr-xr-x 2 ceph ceph 3 Apr 12 04:42 bootstrap-mds
drwxr-xr-x 2 ceph ceph 3 Apr 12 04:42 bootstrap-osd
drwxr-xr-x 2 ceph ceph 3 Apr 12 04:42 bootstrap-rgw
drwxr-xr-x 3 ceph ceph 3 Apr 12 04:42 mon
drwxr-xr-x 3 ceph ceph 3 Apr 12 11:10 osd
drwxr-xr-x 2 ceph ceph 4 Apr 13 10:11 tmp

# ls -hal /var/lib/ceph/tmp/
total 10K
drwxr-xr-x 2 ceph ceph 4 Apr 13 10:11 .
drwxr-x--- 8 ceph ceph 8 Apr 11 23:15 ..
-rwxr-xr-x 1 root root 0 Apr 12 12:13 ceph-disk.activate.lock
-rwxr-xr-x 1 root root 0 Apr 12 12:12 ceph-disk.prepare.lock

#ls /dev/nvm* -hal

drwxr-xr-x 21 root root 4.6K Apr 13 10:11 .
drwxr-xr-x 22 root root 22 Apr 9 18:28 ..
crw------- 1 root root 248, 0 Apr 12 14:55 /dev/nvme0
brw-rw---- 1 root disk 259, 0 Apr 12 14:55 /dev/nvme0n1
brw-rw---- 1 root disk 259, 1 Apr 12 14:55 /dev/nvme0n1p1
 
I have test this setup and it works.
so i would wipe all and start over.

pveceph stop
pveceph pure
umount /var/lib/ceph/..
rm -r /var/lib/ceph
rm -r /etc/ceph

use parted to format the disks
parted /dev/.. mktabel gpt
parted /dev/nvme.. mkpart

now you can start from beginning
 
...
2017-04-13 10:11:10.422432 7fbd65627800 -1 filestore(/var/lib/ceph/tmp/mnt.V_F6HH) mkjournal error creating journal on /var/lib/ceph/tmp/mnt.V_F6HH/journal: (13) Permission denied
2017-04-13 10:11:10.422468 7fbd65627800 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
2017-04-13 10:11:10.422521 7fbd65627800 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.V_F6HH: (13) Permission denied

#ls /dev/nvm* -hal
brw-rw---- 1 root disk 259, 1 Apr 12 14:55 /dev/nvme0n1p1
Hi,
perhaps:
Code:
chown ceph /dev/nvme0n1p1
Udo
 
Hi,
perhaps:
Code:
chown ceph /dev/nvme0n1p1
Udo
Yes, It worked that way. After the reboot, permission becomes restored and OSD is not booting and Ceph is not starting. I think that's a UDEV issue that I've seen on your older posts. But I can't fix that either.
 
I have test this setup and it works.
so i would wipe all and start over.

pveceph stop
pveceph pure
umount /var/lib/ceph/..
rm -r /var/lib/ceph
rm -r /etc/ceph

use parted to format the disks
parted /dev/.. mktabel gpt
parted /dev/nvme.. mkpart

now you can start from beginning
Didn't help. I've tried the next recommendation from Udo and it goes as I replied below.
 
Yes, It worked that way. After the reboot, permission becomes restored and OSD is not booting and Ceph is not starting. I think that's a UDEV issue that I've seen on your older posts. But I can't fix that either.
Hi,
you can try following (which worked for me on an hammer-cluster):

1. name your journal with sgdisk to journal-0 (for osd-0) so that /dev/disk/by-partlabel/journal-0 shows a link to your partition (/dev/nvme0n1p1)
2. put following in ceph.conf:
Code:
[osd]
osd_journal = /dev/disk/by-partlabel/journal-$id
Perhaps autostart worked than?? But I'm not sure, that the ceph-permissions are set...

Udo