Ceph OSD on zvol or dir?

Rob Loan

Well-Known Member
Mar 25, 2017
48
13
48
59
Why can't a zvol be a OSD disk?

root@nas:/etc/pve/rob# pveceph createosd /dev/zvol/z/ceph
unable to get device info for 'zd16'
root@nas:/etc/pve/rob# mkfs.xfs /dev/zvol/z/ceph
specified blocksize 4096 is less than device physical sector size 8192
switching to logical sector size 512
meta-data=/dev/zvol/z/ceph isize=512 agcount=4, agsize=67108864 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0
data = bsize=4096 blocks=268435456, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=131072, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

heck, what would be even cooler is if we could put the first OSD in /var/lib/vz/osd

root@nas:/etc/pve/rob# pveversion -v
proxmox-ve: 5.0-7 (running kernel: 4.10.8-1-pve)
pve-manager: 5.0-10 (running version: 5.0-10/0d270679)
pve-kernel-4.10.8-1-pve: 4.10.8-7
libpve-http-server-perl: 2.0-4
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve2
libqb0: 1.0.1-1
pve-cluster: 5.0-7
qemu-server: 5.0-4
pve-firmware: 2.0-2
libpve-common-perl: 5.0-11
libpve-guest-common-perl: 2.0-1
libpve-access-control: 5.0-4
libpve-storage-perl: 5.0-3
pve-libspice-server1: 0.12.8-3
vncterm: 1.4-1
pve-docs: 5.0-1
pve-qemu-kvm: 2.9.0-1
pve-container: 2.0-6
pve-firewall: 3.0-1
pve-ha-manager: 2.0-1
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-1
lxcfs: 2.0.7-pve1
criu: 2.11.1-1~bpo90
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
ceph: 12.0.2-pve1
 
The idea is that it must be easy to replace a whole OSD Disk, so we only support whole Disk OSDs.
 
Well, that doesn't work so well when all the OSD's journals are on a common SSD :)

Thanks for listening.
 
Any update on this? Any workaround or plan?
I also prefer zvol as osd disk instead of the whole disk.

Please do not try to convince to use whole disk, because the question is how to use zvol instead of physical disk, not why use whole disk....
Thanks!
 
I just got it work with zvol.

BIG FAT WARNING: DO NOT follow (copy-paste) these steps without knowledge of zfs, ceph and how to do research using google, otherwise you could easily destroy your existing cluster/data.

Steps:
- create snapshot for all your data in your zfs pools, just in case, backup, etc.
- read this: http://www.kernelpanik.net/running-ceph-on-zfs/
- only install ceph using proxmox cli tool on your nodes (https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster)
- create new zfs filesystems dedicated to osdfs and monfs as you can see on kernelpanik, set the attributes on all your ceph nodes
- create ceph mon and mgr according to the proxmox wiki shown above on all your ceph nodes
- creating new zvol dedicated to osd disk (example for 6 disk raidz2: zfs create -V 500G -b 16k -s storage/ceph-disks/cephdisk1 ) on all your ceph nodes
- modify /etc/lvm/lvm.conf to remove filter which blocks the usage of zvol on all your ceph nodes (removing this "r|/dev/zd.*|", from the global_filter, like this ):
Code:
# global_filter = [ "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|" "r|/dev/mapper/.*-(vm|base)--[0-9]+--disk--[0-9]+|"]
global_filter = [ "r|/dev/mapper/pve-.*|" "r|/dev/mapper/.*-(vm|base)--[0-9]+--disk--[0-9]+|"]
- now figure out, what is the device of your zvol disk, in my case in only one node it is zd32 (you MUST check it in all your nodes, because they will be different!!!):
Bash:
# ls -al /dev/zvol/zbackup/ceph-disks/cephdisk1
lrwxrwxrwx 1 root root 13 Dec 30 11:55 /dev/zvol/storage/ceph-disks/cephdisk1 -> ../../../zd32
- get the bootstrap keyring on all your ceph nodes, which will needed to be able to create ceph volume on cli
Code:
ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
- now create a new osd like this on all your ceph nodes, BUT CHECK your zvol to meet your node (zd32 is only example!!!)
Code:
# ceph-volume lvm create --data /dev/zd32
- check your webGUI
- add ceph metadata server to your nodes, at least one of your nodes: # pveceph mds create

My result is:
Kijelölés_580.png

In case of problem: google, read documents, check logs, pray, restore from backup :)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!