Ceph OSD on LVM logical volume.

kwinz

Well-Known Member
Apr 18, 2020
40
16
48
37
Hi,

I know it's not recommended for performance reasons. But I want to create a Ceph OSD on a node with just a single NVMe SSD.
So I kept some free space on the SSD during install.
And then created a new logical volume with `lvcreate -n vz -V 10G pve`

However that volume does not show up when trying to create a new OSD via GUI:ceph-osd-no-disk.PNG


[edit]
pveceph osd create /dev/mapper/pve-vz
results in:
unable to get device info for '/dev/dm-2'

[edit2]:
I will try ceph-disk according to https://forum.proxmox.com/threads/pveceph-unable-to-get-device-info.44927/#post-238545

How do I add a new OSD without having a dedicated disk for it?
 
Last edited:
So here's my little guide for everyone who wants to do this:

1. During install set maxvz to 0 to not create local storage and keep free space for Ceph on the OS drive. [GUIDE, 2.3.1 Advanced LVM Configuration Options ]
2. Setup Proxmox like usual and create a cluster
3. Install Ceph packages and do initial setup (network interfaces etc.) via GUI, also create Managers and Monitors
4. To create OSDs open a shell on each node and

4.a. bootstrap auth [4]:
ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

4.b. Create new logical volume with the remaining free space:
lvcreate -l 100%FREE -n pve/vz

4.c. Create (= prepare and activate) the logical volume for OSD [2] [3]
ceph-volume lvm create --data pve/vz

5. That's it. Now you can keep using GUI to:
  • create Metadata servers,
  • by clicking on a node in the cluster in "Ceph" create a CephFS. And then add in "Datacenter-Storage" for CD images and backups. This will be mounted in /mnt/pve/cephfs/
  • and in "Datacenter-Storage" add an "RDS" block device for virtual VM HDDs.

[GUIDE] https://pve.proxmox.com/pve-docs/pve-admin-guide.pdf
[2] https://docs.ceph.com/docs/master/ceph-volume/lvm/create/#ceph-volume-lvm-create
[3] https://docs.ceph.com/docs/master/ceph-volume/
[4] https://forum.proxmox.com/threads/p...ble-to-create-a-new-osd-id.55730/#post-257533
 
Last edited:
So here's my little guide for everyone who wants to do this:

1. During install set maxvz to 0 to not create local storage and keep free space for Ceph on the OS drive. [GUIDE, 2.3.1 Advanced LVM Configuration Options ]
2. Setup Proxmox like usual and create a cluster
3. Install Ceph packages and do initial setup (network interfaces etc.) via GUI, also create Managers and Monitors
4. To create OSDs open a shell on each node and

Followed these steps with Proxmox VE 8.2.5 and Ceph 18.2.2.
My cluster setup is 6 x proxmox-node, node 1+2+3 are ceph-monitor and ceph-manager.

When trying to add an OSD on node 4+5+6, an error occurs.
These nodes will be providing OSD's, but are not monitor and/or manager.
Bash:
root@proxmox05:# ceph-volume lvm create --data pve/vz
Running command: /usr/bin/ceph-authtool --gen-print-key
-->  RuntimeError: No valid ceph configuration file was loaded.

Cause of this issue is that the symlink in /etc/ceph/ceph.conf is missing. This can be fixed by adding it manually (between steps 3 and 4 in the procedure).
Bash:
ln -s /etc/pve/ceph.conf /etc/ceph/ceph.conf
 
  • Like
Reactions: kwinz
So here's my little guide for everyone who wants to do this:

1. During install set maxvz to 0 to not create local storage and keep free space for Ceph on the OS drive. [GUIDE, 2.3.1 Advanced LVM Configuration Options ]
2. Setup Proxmox like usual and create a cluster
3. Install Ceph packages and do initial setup (network interfaces etc.) via GUI, also create Managers and Monitors
4. To create OSDs open a shell on each node and

4.a. bootstrap auth [4]:
ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

4.b. Create new logical volume with the remaining free space:
lvcreate -l 100%FREE -n pve/vz

4.c. Create (= prepare and activate) the logical volume for OSD [2] [3]
ceph-volume lvm create --data pve/vz

5. That's it. Now you can keep using GUI to:
  • create Metadata servers,
  • by clicking on a node in the cluster in "Ceph" create a CephFS. And then add in "Datacenter-Storage" for CD images and backups. This will be mounted in /mnt/pve/cephfs/
  • and in "Datacenter-Storage" add an "RDS" block device for virtual VM HDDs.

[GUIDE] https://pve.proxmox.com/pve-docs/pve-admin-guide.pdf
[2] https://docs.ceph.com/docs/master/ceph-volume/lvm/create/#ceph-volume-lvm-create
[3] https://docs.ceph.com/docs/master/ceph-volume/
[4] https://forum.proxmox.com/threads/p...ble-to-create-a-new-osd-id.55730/#post-257533
Thanks! I followed this guide and it is good. I am using Proxmox 8.3.5

Just want to add that in the last point:
  • and in "Datacenter-Storage" add an "RDS" block device for virtual VM HDDs.
Just want to add that in the last point: In Proxmox 8.3.5 it seems there is no "RDS" under the "Add" menu.

1742420339346.png

What I actually did was to:
1. Click one of the nodes
2. Ceph - Pools
3. Create: Cephpool

Then I have the storage pool to save VMs on.

1742420612753.png

Hope this helps guys who want to implement this in 2025
 
  • Like
Reactions: kwinz
Followed these steps with Proxmox VE 8.2.5 and Ceph 18.2.2.
My cluster setup is 6 x proxmox-node, node 1+2+3 are ceph-monitor and ceph-manager.

When trying to add an OSD on node 4+5+6, an error occurs.
These nodes will be providing OSD's, but are not monitor and/or manager.
Bash:
root@proxmox05:# ceph-volume lvm create --data pve/vz
Running command: /usr/bin/ceph-authtool --gen-print-key
-->  RuntimeError: No valid ceph configuration file was loaded.

Cause of this issue is that the symlink in /etc/ceph/ceph.conf is missing. This can be fixed by adding it manually (between steps 3 and 4 in the procedure).
Bash:
ln -s /etc/pve/ceph.conf /etc/ceph/ceph.conf
Hey - I'm currently migrating my vms from an old PowerEdge R610 to new mini PCs (3 of them). I have performed this migration and the vms are now on the mini-pcs, with the PowerEdge running the last little bits of infrastructure.

When installing Proxmox on the new mini pcs, I did not create a new disk at the start and therefore there is no disk for ceph. However, I was able to get around the Proxmox error by creating a logical volume group on one of the nodes. I really only want to have 300G on each node available to Ceph while I try out this implementation.

Are there downsides between going with a LVG vs a partitioned disk here? For example:

Bash:
root@drude:~# lsblk
NAME                           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
nvme0n1                        259:0    0 931.5G  0 disk
├─nvme0n1p1                    259:1    0  1007K  0 part
├─nvme0n1p2                    259:2    0     1G  0 part /boot/efi
└─nvme0n1p3                    259:3    0 930.5G  0 part
  ├─pve-swap                   252:0    0     8G  0 lvm  [SWAP]
  ├─pve-root                   252:1    0    96G  0 lvm  /
  ├─pve-data_tmeta             252:2    0     5G  0 lvm
  │ └─pve-data-tpool           252:4    0   490G  0 lvm
  │   ├─pve-data               252:5    0   490G  1 lvm
  │   ├─pve-vm--172--cloudinit 252:7    0     4M  0 lvm
  │   └─pve-vm--172--disk--0   252:8    0   200G  0 lvm
  ├─pve-data_tdata             252:3    0   490G  0 lvm
  │ └─pve-data-tpool           252:4    0   490G  0 lvm
  │   ├─pve-data               252:5    0   490G  1 lvm
  │   ├─pve-vm--172--cloudinit 252:7    0     4M  0 lvm
  │   └─pve-vm--172--disk--0   252:8    0   200G  0 lvm
  └─pve-vz                     252:6    0   300G  0 lvm <-----
root@drude:~# ceph -s
  cluster:
    id:     a4cb6166-bf4b/...
    health: HEALTH_WARN
            OSD count 1 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum drude (age 6d)
    mgr: drude(active, since 6d)
    osd: 1 osds: 1 up (since 5d), 1 in (since 5d)

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   26 MiB used, 300 GiB / 300 GiB avail
    pgs:

Appreciate the help and guidance with this!
 
I don't see a real downside with this approach. Ceph says it wants a disk, but in reality it wants a block device. That can be a disk, a partition, even a logical volume.
Performance wise there is no real penalty in using an LV.
Theoretically: when LVM gets corrupted, you will lose your ceph OSD too. But what are the odds ...
I guess you want to have at least 3 OSD's for Ceph for redundancy and in case of maintenance.

I use Ceph@proxmox for 2 use cases: one pool as a datastore for Proxmox and two pools for Kubernetes to create/access volumes for pods.

Code:
root@proxmox05:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.98
pool 2 'data-ceph' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 2446 lfor 0/608/606 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.97
pool 4 'k8stest' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1598 lfor 0/1235/1233 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 2.19
pool 5 'k8sprod' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1601 lfor 0/1248/1246 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.53
 
I don't see a real downside with this approach. Ceph says it wants a disk, but in reality it wants a block device. That can be a disk, a partition, even a logical volume.
Performance wise there is no real penalty in using an LV.
Theoretically: when LVM gets corrupted, you will lose your ceph OSD too. But what are the odds ...
I guess you want to have at least 3 OSD's for Ceph for redundancy and in case of maintenance.

I use Ceph@proxmox for 2 use cases: one pool as a datastore for Proxmox and two pools for Kubernetes to create/access volumes for pods.

Code:
root@proxmox05:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.98
pool 2 'data-ceph' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 2446 lfor 0/608/606 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.97
pool 4 'k8stest' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1598 lfor 0/1235/1233 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 2.19
pool 5 'k8sprod' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1601 lfor 0/1248/1246 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.53
Thank you for the perspective. I appreciate the help with all this.

I think my next steps here are to reduce the `lvgroup` currently allocated on each of the nodes so that I can create a new logical volume. I am planning out the steps and I think it would include this:

Bash:
# stop pve services
systemctl stop pvedaemon pve-cluster pveproxy pvestatd

# stop all running VMs
qm list
qm stop <vmid>

lvchange -an pve/data # deactivate logical volume
lvs # check lv is inactive

lvreduce -L 494G pve/data # 494 GB (260 gb currently used + some wiggle room) -  I think I will run this with -t to test first

lvchange -ay pve/data # activate lv again

systemctl start pve-cluster pvedaemon pveproxy pvestatd # start services

lvcreate -L 300G -n vz pve # create lv for ceph now
 
Last edited: