Recommended way of creating multiple OSDs per NVMe disk?

victorhooi

Well-Known Member
Apr 3, 2018
253
20
58
38
I'm attempting to setup a new 3-node Proxmox/Ceph cluster.

Each node has an Intel Optane 900p (480GB) NVMe drive.

I've read that the recommendation is to setup 4 OSDs per NVMe drive.

Through the Proxmox GUI, I can create a single OSD per disk. However, there's no way to create multiple OSDs - I assume I need to do the partitioning outside of Proxmox?

Is there a recommended way of partitioning it optimally for Ceph? Or an example of how it should be done?
 
I can confirm that the ceph-volume command from spirit works =), and it appears in Proxmox GUI after.

Initially, ceph-volume complained about GPT headers on the disk:

Code:
ceph-volume lvm batch: error: GPT headers found, they must be removed on: /dev/nvme0n1

Based on this post, I used wipefs to clear the disk:
Code:
root@vwnode1:~# wipefs -a /dev/nvme0n1
/dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/nvme0n1: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 41 52 54
/dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/nvme0n1: calling ioctl to re-read partition table: Success
Then I called ceph-volume:
Code:
root@vwnode1:~# ceph-volume lvm batch --osds-per-device 4 /dev/nvme0n1

Total OSDs: 4

  Type            Path                                                    LV Size         % of device
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            111.78 GB       25%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            111.78 GB       25%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            111.78 GB       25%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            111.78 GB       25%
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no) yes
Running command: vgcreate --force --yes ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818 /dev/nvme0n1
 stdout: Physical volume "/dev/nvme0n1" successfully created.
 stdout: Volume group "ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818" successfully created
Running command: lvcreate --yes -l 28424 -n osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937 ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818
 stdout: Logical volume "osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937" created.
Running command: lvcreate --yes -l 28424 -n osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155 ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818
 stdout: Logical volume "osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155" created.
Running command: lvcreate --yes -l 28424 -n osd-data-00a60cac-d170-4977-be85-f1990a682e6b ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818
 stdout: Logical volume "osd-data-00a60cac-d170-4977-be85-f1990a682e6b" created.
Running command: lvcreate --yes -l 28424 -n osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2 ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818
 stdout: Logical volume "osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2" created.
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 0ac6029b-fd00-4e37-bebe-515b494f1674
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: chown -h ceph:ceph /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937
Running command: chown -R ceph:ceph /dev/dm-12
Running command: ln -s /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937 /var/lib/ceph/osd/ceph-0/block
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
 stderr: got monmap epoch 3
Running command: ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQCEKoJcr6LCFRAASCO2ccc1qhgNT/HBT2wt3Q==
 stdout: creating /var/lib/ceph/osd/ceph-0/keyring
added entity osd.0 auth auth(auid = 18446744073709551615 key=AQCEKoJcr6LCFRAASCO2ccc1qhgNT/HBT2wt3Q== with 0 caps)
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 0ac6029b-fd00-4e37-bebe-515b494f1674 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937 --path /var/lib/ceph/osd/ceph-0
Running command: ln -snf /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937 /var/lib/ceph/osd/ceph-0/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
Running command: chown -R ceph:ceph /dev/dm-12
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: systemctl enable ceph-volume@lvm-0-0ac6029b-fd00-4e37-bebe-515b494f1674
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-0ac6029b-fd00-4e37-bebe-515b494f1674.service → /lib/systemd/system/ceph-volume@.service.
Running command: systemctl enable --runtime ceph-osd@0
Running command: systemctl start ceph-osd@0
--> ceph-volume lvm activate successful for osd ID: 0
--> ceph-volume lvm create successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-f87aa288-a1a4-4faa-9e30-ecf4d9e06937
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 20a5a6e5-a3c9-4c06-8814-727fcae823c8
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: chown -h ceph:ceph /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155
Running command: chown -R ceph:ceph /dev/dm-13
Running command: ln -s /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155 /var/lib/ceph/osd/ceph-1/block
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1/activate.monmap
 stderr: got monmap epoch 3
Running command: ceph-authtool /var/lib/ceph/osd/ceph-1/keyring --create-keyring --name osd.1 --add-key AQCHKoJcU7DODRAAa6UgifJC1hCH8/ZtxEAR8g==
 stdout: creating /var/lib/ceph/osd/ceph-1/keyring
added entity osd.1 auth auth(auid = 18446744073709551615 key=AQCHKoJcU7DODRAAa6UgifJC1hCH8/ZtxEAR8g== with 0 caps)
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/keyring
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 1 --monmap /var/lib/ceph/osd/ceph-1/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-1/ --osd-uuid 20a5a6e5-a3c9-4c06-8814-727fcae823c8 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155 --path /var/lib/ceph/osd/ceph-1
Running command: ln -snf /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155 /var/lib/ceph/osd/ceph-1/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block
Running command: chown -R ceph:ceph /dev/dm-13
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: systemctl enable ceph-volume@lvm-1-20a5a6e5-a3c9-4c06-8814-727fcae823c8
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-1-20a5a6e5-a3c9-4c06-8814-727fcae823c8.service → /lib/systemd/system/ceph-volume@.service.
Running command: systemctl enable --runtime ceph-osd@1
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@1.service → /lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@1
--> ceph-volume lvm activate successful for osd ID: 1
--> ceph-volume lvm create successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-e432bf7d-f1d2-484c-987d-df264a6c7155
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 35a3e0b4-90ba-4e82-8602-589acdf10759
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: chown -h ceph:ceph /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-00a60cac-d170-4977-be85-f1990a682e6b
Running command: chown -R ceph:ceph /dev/dm-14
Running command: ln -s /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-00a60cac-d170-4977-be85-f1990a682e6b /var/lib/ceph/osd/ceph-2/block
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
 stderr: got monmap epoch 3
Running command: ceph-authtool /var/lib/ceph/osd/ceph-2/keyring --create-keyring --name osd.2 --add-key AQCKKoJclID+BBAA5/nCkfGAotqRoD8XMJnGKg==
 stdout: creating /var/lib/ceph/osd/ceph-2/keyring
added entity osd.2 auth auth(auid = 18446744073709551615 key=AQCKKoJclID+BBAA5/nCkfGAotqRoD8XMJnGKg== with 0 caps)
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 35a3e0b4-90ba-4e82-8602-589acdf10759 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-00a60cac-d170-4977-be85-f1990a682e6b
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-00a60cac-d170-4977-be85-f1990a682e6b --path /var/lib/ceph/osd/ceph-2
Running command: ln -snf /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-00a60cac-d170-4977-be85-f1990a682e6b /var/lib/ceph/osd/ceph-2/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: chown -R ceph:ceph /dev/dm-14
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: systemctl enable ceph-volume@lvm-2-35a3e0b4-90ba-4e82-8602-589acdf10759
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-35a3e0b4-90ba-4e82-8602-589acdf10759.service → /lib/systemd/system/ceph-volume@.service.
Running command: systemctl enable --runtime ceph-osd@2
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service → /lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@2
--> ceph-volume lvm activate successful for osd ID: 2
--> ceph-volume lvm create successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-00a60cac-d170-4977-be85-f1990a682e6b
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new f470d08e-e055-4ffa-9fa4-0f7ad2c9146e
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-3
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: chown -h ceph:ceph /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2
Running command: chown -R ceph:ceph /dev/dm-15
Running command: ln -s /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2 /var/lib/ceph/osd/ceph-3/block
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-3/activate.monmap
 stderr: got monmap epoch 3
Running command: ceph-authtool /var/lib/ceph/osd/ceph-3/keyring --create-keyring --name osd.3 --add-key AQCMKoJc7DjXNRAAUEf+OfgRob+ifXshJpQAeQ==
 stdout: creating /var/lib/ceph/osd/ceph-3/keyring
added entity osd.3 auth auth(auid = 18446744073709551615 key=AQCMKoJc7DjXNRAAUEf+OfgRob+ifXshJpQAeQ== with 0 caps)
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/keyring
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 3 --monmap /var/lib/ceph/osd/ceph-3/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-3/ --osd-uuid f470d08e-e055-4ffa-9fa4-0f7ad2c9146e --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-3
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2 --path /var/lib/ceph/osd/ceph-3
Running command: ln -snf /dev/ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2 /var/lib/ceph/osd/ceph-3/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-3/block
Running command: chown -R ceph:ceph /dev/dm-15
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-3
Running command: systemctl enable ceph-volume@lvm-3-f470d08e-e055-4ffa-9fa4-0f7ad2c9146e
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-3-f470d08e-e055-4ffa-9fa4-0f7ad2c9146e.service → /lib/systemd/system/ceph-volume@.service.
Running command: systemctl enable --runtime ceph-osd@3
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@3.service → /lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@3
--> ceph-volume lvm activate successful for osd ID: 3
--> ceph-volume lvm create successful for: ceph-508cf3d6-ca6a-4f6a-a8cf-b030d20ef818/osd-data-22bfd674-c677-48d8-80c9-19b81c5e1ad2
root@vwnode1:~#
 
Finally - I'm at the stage to create a Ceph pool.

In the Proxmox GUI, there's an option to "Add storage":

OiQvUuH.png


Even reading the help it's unclear what this actually does:

If you would like to automatically get also a storage definition for your pool, active the checkbox "Add storages" on the GUI or use the command line option --add_storages on pool creation.

Can somebody provide some more context or background as to what "Add storage" is, and when you should use it?
 
you can also wipe with

#ceph-volume lvm zap --destroy /dev/nvme0n1


(I have see improvement with internal bluestore memory buffers, fragmentation, and latency with 2 osd by nvme. but if you have enough memory, you can try with 4 osd too)
 
Edit:

Sorry just saw what you mean, I can imagine this just automatically adds the new Pool as a storage within Proxmox, and saves you the step of then having to manually add the RBD Storage location into Promox for you to be able to use it.


The Pool creation is a CEPH config, the add storage is adding the Ceph Pool into Proxmox.

Finally - I'm at the stage to create a Ceph pool.

In the Proxmox GUI, there's an option to "Add storage":

OiQvUuH.png


Even reading the help it's unclear what this actually does:

If you would like to automatically get also a storage definition for your pool, active the checkbox "Add storages" on the GUI or use the command line option --add_storages on pool creation.

Can somebody provide some more context or background as to what "Add storage" is, and when you should use it?
 
Extending the above - is there way a way to create multiple OSDs, as well as a separate disk for WAL/DB?

Previously, I used the ceph-volume command to create multiple OSDs per disk:

Code:
ceph-volume lvm batch --osds-per-device 4 /dev/nvme0n1

Alternatively I have used the pveceph command to create an OSD, with a WAL/DB on a different disk (Optane in our case):
Code:
pveceph osd create /dev/sda -db_dev /dev/nvme1n1 -db_size 145

However, is there a way to create multiple OSDs per disk, and have the WAL/DB on a different disk?
 
Last edited:
  • Like
Reactions: Otter7721
Extending the above - is there way a way to create multiple OSDs, as well as a separate disk for WAL/DB?

Previously, I used the ceph-volume command to create multiple OSDs per disk:

Code:
ceph-volume lvm batch --osds-per-device 4 /dev/nvme0n1

Alternatively I have used the pveceph command to create an OSD, with a WAL/DB on a different disk (Optane in our case):
Code:
pveceph osd create /dev/sda -db_dev /dev/nvme1n1 -db_size 145

However, is there a way to create multiple OSDs per disk, and have the WAL/DB on a different disk?
According to the official website document, each DB/WAL should establish a logical volume on the volume group, instead of sharing a physical device for all OSDs.
Therefore, if you use logical volume, you should be able to solve the problem.
https://docs.ceph.com/en/quincy/rados/configuration/bluestore-config-ref/#block-and-block-db
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!