How and When to use CephFS

hisaltesse

Well-Known Member
Mar 4, 2009
227
2
58
The PVE 5.3 release notes says: "The distributed file system CephFS eliminates the need for external file storage such as NFS or Samba and thus helps reducing hardware cost and simplifies management."

1. I have never used this and as I am about to setup new pve servers I would like to get some ideas or links to good tutorials on how to use CephFS and what to use it for.

2. I have trouble understanding how this would replace NFS and how it is distributed and how it fits into an environment of 3 pve clusters.

Thank you.
 
  • Like
Reactions: David Herselman
  • Like
Reactions: Alwin and ES2Burn
Thank you Chris,

But if this replaces the NAS, wouldn't ceph running off of all 3 nodes create major overhead on the servers and network bandwidth?
3. Any known benchmark showing how much additional memory, cpu and network bandwidth Ceph takes to keep all storage in sync?

4. Also Isn't Ceph storage better for just backups/ISO/templates and not for running live VMs?
If I want to replicate incrementally the live VMs or the live VM storage pool entirely, what would be the best way to do that if other than NAS?

5. Is it better to replicate live VMs or the entire VM storage pool?

6. Is it better to run ceph servers separate from the proxmox nodes for bandwidth and hardware resource optimization?
 
Last edited:
3. Any known benchmark showing how much additional memory, cpu and network bandwidth Ceph takes to keep all storage in sync?
You can find our Ceph Benchmark paper in this thread, plus there are other sharing their results.
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

4. Also Isn't Ceph storage better for just backups/ISO/templates and not for running live VMs?
If I want to replicate incrementally the live VMs or the live VM storage pool entirely, what would be the best way to do that if other than NAS?
For the first question, yes. But as Ceph services build on top RADOS, a object store, Ceph comes with a service called RBD (RADOS block device). This is used as a storage for VM/CT images (disks).

5. Is it better to replicate live VMs or the entire VM storage pool?
With RBD, you will have by default three copies, each copy replicated to each node.

6. Is it better to run ceph servers separate from the proxmox nodes for bandwidth and hardware resource optimization?
See our benchmark paper. A separation dependence on many factors like, do you have enough resources for hyper-convergence or are your IO need exceeding these, do you want/need a separation of concerns? And others.
 
Thank you for your reply.

Q7: Can Ceph automatically grow the pool as you add more server nodes?

Q8: Also do I understand it correctly that if i plan to use Ceph then I should not install proxmox with/on ZFS RAID10?

Q9: If I have 10 drive bays, what storage config would be recommended?
- Proxmox on 2 x HDD ZFS RAID1 and leave the rest of the 8 drives untouched for when I setup Ceph, at which point I would allocate them to Ceph?
- Or can Proxmox be installed completely on Ceph storage ad system storage? (I would think not because you have to install Proxmox before you install Ceph right?)​
 
  • Like
Reactions: redessonar
Q7:
Theoretically, Yes. but in reality, it would be much much complex.

Q8:
It depends on how you want to set up your storages, if your server has a lot of disks, like 24-48. you can group them as various format as you like.

Q9:
I would suggest 2 x disks Raid1 for the system drive. the rest 8 disks, make each with raid0, now you have 9 virtual disks, 1x Raid 1 and 8 x Raid0

Install Proxmox on Raid 1, leave the raid0s alone

When Done, login into the Proxmox Web console and install Ceph, then add raid0 one by one as OSDs.

I wish this will help.
 
Q9:
I would suggest 2 x disks Raid1 for the system drive. the rest 8 disks, make each with raid0, now you have 9 virtual disks, 1x Raid 1 and 8 x Raid0.

This doesn’t make sense. RAID0 is striping and requires a minimum of 2 drives.

The point of my question was to understand if the additional drives should be left untouched for Ceph to manage if if they should be raid-ed somehow.
 
This doesn’t make sense. RAID0 is striping and requires a minimum of 2 drives.
No, raid0 with 1 disk is like: 1 disk, no raid, no stipe.
Old raidcontrollers dont have IT-mode, therefore you must use Raid0 to use one disk at a time for Ceph or ZFS.
 
Just DO NOT use any form of RAID for Ceph, more than often the RAID controllers lead to unexpected issues that you do not want to have in a production environment. Secondly, HBAs are rather cheap, in comparison to RAID controllers.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html
 
  • Like
Reactions: herzkerl and Tmanok
No, raid0 with 1 disk is like: 1 disk, no raid, no stipe.

It's true one can make one standalone drive RAID0, but the drive will still contain the RAID metadata, so it's still treated as hardware RAID (but crippled) instead of true IT/passthrough mode. This means getting drive SMART stats is relatively less straight-forward than a drive attached to an HBA or a SATA port on a commodity motherboard.

Secondly, HBAs are rather cheap, in comparison to RAID controllers.

True, but I guess the main problem is most of the people have had older servers that were initially designed for / shipped with hardware RAID card, but getting an HBA to work on these servers is not as easy as one may think. It may not be as simple as just getting a true HBA and chuck into the server and then everything would work like a treat. It's really like a hit and miss.

For example, from my experience, I was trying to get an HBA (similar to this) to work on multiple Dell rack servers e.g. R720, R720xd, R710, R510 that were initially shipped with hardware RAID , apart from the hassle of getting and running the SAS cable inside the chassis, I've only got true passthrough working on an R510, i.e. the drives could be detected by the HBA. I think the main problem with the compatibility has got something to do with the drive backplane's firmware (SAS backplane in my case). Even say R720 actually has different drive backplane variants, so maybe some could work with a generic HBA. I'm not sure if getting an official HBA from Dell would actually work though has I think the firmware is identical to the official one form LSI.

I was a bit surprised to hear that VMware vSAN could actually support marking each drive as RAID0 to workaround the problem, otherwise it'd be a big cost of the prospective users to change to newer hardware, which would then become a road block to adopting the platform. Not sure whether this is still the case though. At least I know Windows storage spaces requires true passthrough.

So if one intends to deploy Ceph in production environment, maybe avoid cutting corners and just get the right hardware, which does cost money..........
 
  • Like
Reactions: Alwin
We have 3 R720 in Ceph, i used this SAS3 card https://www.supermicro.com/products/accessories/addon/AOC-S3008L-L8e.cfm
And 2 cables for each server that transform SAS2 backplane to the SAS3 card.
It was really cheap and works perfect, i did remove the H710.
Proxmox boots on raid1 ZFS, and 4 SAS Samsung Enterprise PM1643 in each.
Only minor inconvenience is that disk is 12 Gbit, Dell backplane is only 6Gbit, but i can move these disk to future R740 or R750.

Our R740 + H740 card could be transformed to IT-mode via Dell gui, so no problems there.
 
  • Like
Reactions: Tmanok
Thank you for your reply.

Q7: Can Ceph automatically grow the pool as you add more server nodes?

Q8: Also do I understand it correctly that if i plan to use Ceph then I should not install proxmox with/on ZFS RAID10?

Q9: If I have 10 drive bays, what storage config would be recommended?
- Proxmox on 2 x HDD ZFS RAID1 and leave the rest of the 8 drives untouched for when I setup Ceph, at which point I would allocate them to Ceph?​
- Or can Proxmox be installed completely on Ceph storage ad system storage? (I would think not because you have to install Proxmox before you install Ceph right?)​
I have 3 r720xd with 12 disk x 300Gb each one, 2 disk in raid 1 for OS , 10 disk in raid 0 for OSD, so I have 30 OSD.
I added pool and add as storage with pg_num 1024 followig this page https://ceph.com/pgcalc/
But the storage only have 2.58 Tb, I don't undestarnd why happed that, if I have 8.17Tb
Can I increase capacibility of this storage?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!