Partition NVME to host both journal and osd ?

AlexLup

Well-Known Member
Mar 19, 2018
218
15
58
43
I finally got my fast NVMe enterprise drives (Samsung PM983) at last, added a few journals to it for smaller spinners and would like to use the remaining disk space as an OSD/Monitor db/maybe swap. Are there any guides on how to achieve this ?

Tried to peak at promox tooling in the GUI to tap into the commands -> but to no avail:

GUI -> Add OSD
create OSD on /dev/sdb (bluestore)
creating block.db on '/dev/nvme0n1'
Physical volume "/dev/nvme0n1" successfully created.
Volume group "ceph-a457b2c6-0545-44f6-bb23-9f0cf1dc53c2" successfully created
Logical volume "osd-db-329e91fd-b019-4027-b137-e0b3f76204d8" created.

# lsblk /dev/nvme0n1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 894.3G 0 disk
├─ceph--a457b2c6--0545--44f6--bb23--9f0cf1dc53c2-osd--db--329e91fd--b019--4027--b137--e0b3f76204d8 253:8 0 4G 0 lvm
└─ceph--a457b2c6--0545--44f6--bb23--9f0cf1dc53c2-osd--db--eb696692--f66f--4678--88b5--905ec5c7c62e 253:18 0 4G 0 lvm

# parted /dev/nvme0n1 -a optimal mkpart primary 8GB 28GB
Error: /dev/nvme0n1: unrecognised disk label

# ceph-volume lvm batch --osds-per-device 4 /dev/nvme0n1
--> All devices are already used by ceph. No OSDs will be created.
 
added a few journals to it for smaller spinners
A journal was used by the old filestore backend. The Bluestore backend uses the database RocksDB to hold mainly its metadata.

would like to use the remaining disk space as an OSD/Monitor db/maybe swap.
This greatly depends on how much IO capacity is still left. But don't recommend the use for something else. The database for the OSDs will have lots of small reads / writes. Depending on the size of the DB, the partitions / LVs will need to be somewhere around 3, 30, 300 GB big. Otherwise the DB will spill over to the data disk, once it has reached the max. available size.
 
Alwin, thank you for the reply.

I was looking for the actual commands to partition up my NVMe after I have added the RocksDB.

The limitations of the disk are understood, and looking at the current graphs I have now there is a LONG way to go before I hit the IOPS bottleneck of 200k that these disks have..
 
You will need to work with LVM and create separate LVs (eg. lvcreate).
 
I tried that, but I fear that if I create it wrong, it might overwrite the whole disk. Hence I was actually digging into the perl code to see what the actual command for "creating block.db on '/dev/nvme0n1'" is, so that I could add one more partition for OSD usage after the block.db partitions already being created.
 
I tried that, but I fear that if I create it wrong, it might overwrite the whole disk.
Well, that can always happen. But AFAIS, there are only OSD LVs on them. So if something happens, they can be re-created very easily. The data will be recovered from other OSDs in the cluster (hence the 3x replica). ;)

Code:
# lsblk /dev/nvme0n1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 894.3G 0 disk
├─ceph--a457b2c6--0545--44f6--bb23--9f0cf1dc53c2-osd--db--329e91fd--b019--4027--b137--e0b3f76204d8 253:8 0 4G 0 lvm
└─ceph--a457b2c6--0545--44f6--bb23--9f0cf1dc53c2-osd--db--eb696692--f66f--4678--88b5--905ec5c7c62e 253:18 0 4G 0 lvm
These are LVs. They can be added or removed seamlessly (if not in use). After creating an LV you will get a virtual block device, that you can use like any other block device.
 
Tried with lvcreate says disk is busy, so I am back to finding out exactly which command is run at
/usr/share/perl5/PVE/API2/Ceph/OSD.pm:
print "creating block.$type on '$d->{dev}\n"; -> print "creating block.$type on '$d->{dev}' with $cmd\n";

...
 
Check vgs and lvs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!