Ceph for boot, workspace, archive?

ndoggac · Oct 31, 2019

I have a 4-node PVE cluster with bonded 10gb NICs. Ceph storage is currently setup with 4 1TB SSD OSD's on each node.

I plan on adding at least 4 platter drives (10TB HDD) per node for larger storage. My plan was to edit the crush rule so that boot drive pools (ie. ceph-lxc, and ceph-vm) would only utilize the SSD's (by class), and then create a pool called ceph-workspace that only utilized the HDD's (again by class).

For all containers created I want to have bind mounted /workspace directory with quotas around 250GB, with maximums around 1TB. All data written to the /workspace mount would go to the platter drives. A lot of the containers will be dealing with "large" data sets, and I don't want that data included in the container backups, ballooning them to an unmanageable size. However, I want this bind mounted workspace directory to be available for highly available containers.

Is creating a CephFS utilizing the ceph-workspace pool (utilizing the proper underlying crush rule of course) the appropriate method for this, or is there a better approach? I read that a ceph pool will only operate as the slowest disk...will the crush rule alleviate this limitation, so that the boot drives are only limited by the slowest SSD, and the /workspace mounts are limited by the slowest HDD? How do I properly implement the quota for this bind mount? I don't want to add this as a disk under resources, because then it would be included in the backup of the container...correct?

I was also contemplating creating an /archive bind mount in each container that would point to an NFS mount (from external NFS server) created in the Proxmox cluster. The bind mount would mount in only the appropriate ZFS fileset (with quotas approaching 20-30TB) from the external NFS server. Wondering if there's a better method other than NFS for this? I was also thinking the external NFS server would also provide Samba mounts for Windows VMs, and NFS mounts for fully virtualized *nix machines.

Any best practice inputs would be appreciated.

Alwin · Oct 31, 2019

ndoggac said:
I read that a ceph pool will only operate as the slowest disk...will the crush rule alleviate this limitation, so that the boot drives are only limited by the slowest SSD, and the /workspace mounts are limited by the slowest HDD?

This calls for the device class based rules, see our docs for it.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_device_classes

ndoggac said:
How do I properly implement the quota for this bind mount?

CephFS quotas are per directory not user/group. And depending on performance needs the CephFS might not scale as wished. Also we do not support VM/CT on CephFS.

ndoggac said:
I was also contemplating creating an /archive bind mount in each container that would point to an NFS mount (from external NFS server) created in the Proxmox cluster. The bind mount would mount in only the appropriate ZFS fileset (with quotas approaching 20-30TB) from the external NFS server. Wondering if there's a better method other than NFS for this? I was also thinking the external NFS server would also provide Samba mounts for Windows VMs, and NFS mounts for fully virtualized *nix machines.

As a possible way to go. It would add the benefit of having a share for OSes that do not support CephFS (eg. Windows). Samba/Ganesha NFS can also talk directly with CephFS (as client). But this is out of scope for Proxmox VE use of CephFS.

ndoggac · Oct 31, 2019

Thanks for your answers...

This calls for the device class based rules, see our docs for it.

I realize this, but my question was about performance implications. To word it differently, once the crush/class rules are implemented, is a pool using the SSD class rule still limited by HDD OSD being in the ceph cluster? I would assume not, since they are isolated by the class rule.

CephFS quotas are per directory not user/group. And depending on performance needs the CephFS might not scale as wished. Also we do not support VM/CT on CephFS.

I would create the CephFS mounts on the hypervisor in each physical node in the cluster, and then bind mount a specific directory from that CephFS mount into the container. This still wouldn't scale?

How else could I approach the problem of writing large data sets from a container on a Proxmox cluster, and not wanting to balloon the container backup size?

Alwin · Oct 31, 2019

ndoggac said:
I would assume not, since they are isolated by the class rule.

Correct.

ndoggac said:
I would create the CephFS mounts on the hypervisor in each physical node in the cluster, and then bind mount a specific directory from that CephFS mount into the container. This still wouldn't scale?

It is about small read/writes from/to CephFS, with bind mounts those will be the case. You can test it for yourself and compare RBD to CephFS performance. I don't say it isn't possible but the performance is just not on par with RBD.

ndoggac said:
How else could I approach the problem of writing large data sets from a container on a Proxmox cluster, and not wanting to balloon the container backup size?

With vzdump they aren't included by default.
https://pve.proxmox.com/pve-docs/chapter-pct.html#_backup_of_containers_mount_points

ndoggac · Oct 31, 2019

You can test it for yourself and compare RBD to CephFS performance. I don't say it isn't possible but the performance is just not on par with RBD.

Thanks! Still confused though.....can I mount an RBD pool onto the hypervisor in order to bind mount it into the container? What is the file path?

Alwin · Nov 4, 2019

ndoggac said:
Still confused though.....can I mount an RBD pool onto the hypervisor in order to bind mount it into the container? What is the file path?

Just create a mountpoint (mp) on the GUI, the RBD image will be create automatically. On the ClI you need to use the storage notation as path.
https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_mount_points

Search

Search

Ceph for boot, workspace, archive?

ndoggac

Renowned Member

Alwin

Proxmox Retired Staff

ndoggac

Renowned Member

Alwin

Proxmox Retired Staff

ndoggac

Renowned Member

Alwin

Proxmox Retired Staff