Ceph for boot, workspace, archive?

ndoggac

Renowned Member
Jul 11, 2012
23
5
68
I have a 4-node PVE cluster with bonded 10gb NICs. Ceph storage is currently setup with 4 1TB SSD OSD's on each node.

I plan on adding at least 4 platter drives (10TB HDD) per node for larger storage. My plan was to edit the crush rule so that boot drive pools (ie. ceph-lxc, and ceph-vm) would only utilize the SSD's (by class), and then create a pool called ceph-workspace that only utilized the HDD's (again by class).

For all containers created I want to have bind mounted /workspace directory with quotas around 250GB, with maximums around 1TB. All data written to the /workspace mount would go to the platter drives. A lot of the containers will be dealing with "large" data sets, and I don't want that data included in the container backups, ballooning them to an unmanageable size. However, I want this bind mounted workspace directory to be available for highly available containers.

Is creating a CephFS utilizing the ceph-workspace pool (utilizing the proper underlying crush rule of course) the appropriate method for this, or is there a better approach? I read that a ceph pool will only operate as the slowest disk...will the crush rule alleviate this limitation, so that the boot drives are only limited by the slowest SSD, and the /workspace mounts are limited by the slowest HDD? How do I properly implement the quota for this bind mount? I don't want to add this as a disk under resources, because then it would be included in the backup of the container...correct?

I was also contemplating creating an /archive bind mount in each container that would point to an NFS mount (from external NFS server) created in the Proxmox cluster. The bind mount would mount in only the appropriate ZFS fileset (with quotas approaching 20-30TB) from the external NFS server. Wondering if there's a better method other than NFS for this? I was also thinking the external NFS server would also provide Samba mounts for Windows VMs, and NFS mounts for fully virtualized *nix machines.

Any best practice inputs would be appreciated.
 
Last edited:
I read that a ceph pool will only operate as the slowest disk...will the crush rule alleviate this limitation, so that the boot drives are only limited by the slowest SSD, and the /workspace mounts are limited by the slowest HDD?
This calls for the device class based rules, see our docs for it.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_device_classes

How do I properly implement the quota for this bind mount?
CephFS quotas are per directory not user/group. And depending on performance needs the CephFS might not scale as wished. Also we do not support VM/CT on CephFS.

I was also contemplating creating an /archive bind mount in each container that would point to an NFS mount (from external NFS server) created in the Proxmox cluster. The bind mount would mount in only the appropriate ZFS fileset (with quotas approaching 20-30TB) from the external NFS server. Wondering if there's a better method other than NFS for this? I was also thinking the external NFS server would also provide Samba mounts for Windows VMs, and NFS mounts for fully virtualized *nix machines.
As a possible way to go. It would add the benefit of having a share for OSes that do not support CephFS (eg. Windows). Samba/Ganesha NFS can also talk directly with CephFS (as client). But this is out of scope for Proxmox VE use of CephFS.
 
Thanks for your answers...

This calls for the device class based rules, see our docs for it.
I realize this, but my question was about performance implications. To word it differently, once the crush/class rules are implemented, is a pool using the SSD class rule still limited by HDD OSD being in the ceph cluster? I would assume not, since they are isolated by the class rule.

CephFS quotas are per directory not user/group. And depending on performance needs the CephFS might not scale as wished. Also we do not support VM/CT on CephFS.
I would create the CephFS mounts on the hypervisor in each physical node in the cluster, and then bind mount a specific directory from that CephFS mount into the container. This still wouldn't scale?


How else could I approach the problem of writing large data sets from a container on a Proxmox cluster, and not wanting to balloon the container backup size?
 
I would assume not, since they are isolated by the class rule.
Correct.

I would create the CephFS mounts on the hypervisor in each physical node in the cluster, and then bind mount a specific directory from that CephFS mount into the container. This still wouldn't scale?
It is about small read/writes from/to CephFS, with bind mounts those will be the case. You can test it for yourself and compare RBD to CephFS performance. I don't say it isn't possible but the performance is just not on par with RBD.

How else could I approach the problem of writing large data sets from a container on a Proxmox cluster, and not wanting to balloon the container backup size?
With vzdump they aren't included by default.
https://pve.proxmox.com/pve-docs/chapter-pct.html#_backup_of_containers_mount_points
 
You can test it for yourself and compare RBD to CephFS performance. I don't say it isn't possible but the performance is just not on par with RBD.

Thanks! Still confused though.....can I mount an RBD pool onto the hypervisor in order to bind mount it into the container? What is the file path?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!