Mount external CephFS - no fsname in /etc/pve/storage.cfg

tidnab

New Member
May 26, 2023
3
0
1
Hi

I'm trying to add external cephfs storage directly in /etc/pve/storage.cfg because I need to specify "subdir". (It is not possible to type subdir in the GUI)
But I also have to specify 'FS name' because it's not the default cephfs file system. Can I add 'FS name' to etc/pve/storage.cfg and how?
 
Code:
cephfs: cephfs
    path /mnt/pve/cephfs
    content backup,vztmpl,iso
    fs-name cephfs

fs-name is the parameter that you most likely need :)
 
Code:
cephfs: cephfs
    path /mnt/pve/cephfs
    content backup,vztmpl,iso
    fs-name cephfs

fs-name is the parameter that you most likely need :)
So if I were to add 'images' to the content clause, could I then also store VMs on CephFS?

Because I really don't understand why VMs aren't supported on CephFS, or what the disadvantages of that would be...

P.S. I've been using oVirt HCI for the last years, tried Xcp-ng and am still mostly confused
 
Because I really don't understand why VMs aren't supported on CephFS, or what the disadvantages of that would be...
That won't work, as we don't allow it.

Create a new Ceph pool and if the "Add Storage" checkbox is enabled, a matching storage will be added to the Proxmox VE config. It will be of the type RBD (rados block device), the block device functionality on top of Ceph.

It is designed with VMs in mind. CephFS has a few things that make it unsuitable for VM storage. One, storing QCOW2 files for example add another layer that is not needed.
The major one though is that if an MDS (metadata server, providing the FS functionality) fails and a standby MDS needs to take over, it can take a bit until the CephFS is available again. On large ones it might even take a few minutes. Not something that can be used for VM images ;)
 
  • Like
Reactions: abufrejoval
That won't work, as we don't allow it.
On its own, not a nice argument
Create a new Ceph pool and if the "Add Storage" checkbox is enabled, a matching storage will be added to the Proxmox VE config. It will be of the type RBD (rados block device), the block device functionality on top of Ceph.
That's what I had done, too, it was mostly in the process of testing the import of VMs from other hypervisors that I felt a file system might save some copying around, especially with 'huge' disks that are actually quite sparse (I am so used to adding TB disks to VM and then not use them, relying on sparsity and trimming to keep them small).
It is designed with VMs in mind. CephFS has a few things that make it unsuitable for VM storage. One, storing QCOW2 files for example add another layer that is not needed.
That could have made it into your excellent documentation (perhaps with more data), because it's not as obvious to me as an RHV/oVirt GlusterFS user: GlusterFS has not gained fame as a speed devil, but unless you're talking Infiniband, another layer without a kernel/userland transition doesn't sound that expensive.
oVirt/RHV actually puts another block/chunk layer on top of the file system, but that's mostly to ensure some distribution of the otherwise monolithic disk files. And then it's also because oVirt/RHV was originall designed for SAN storage.
The major one though is that if an MDS (metadata server, providing the FS functionality) fails and a standby MDS needs to take over, it can take a bit until the CephFS is available again. On large ones it might even take a few minutes. Not something that can be used for VM images ;)
On one hand that's another welcome insight, on the other 'minutes' certainly sounds disastrous in a storage context.

So does putting the node which runs the active MDS into maintenance transfer that role to a standby MDS without such an expensive arbitration? Does starting a standby server reduce the currently active into a standby? (I guess I should start reading the Ceph documentation...)

Again, I may be rather spoiled by how tolerant Gluster is to single node storage disruptions, but then my motivation to come to Proxmox and Ceph is the lack of any future for Gluster and oVirt now that all downstream commercial products are gone.
 
I guess I should start reading the Ceph documentation...
:) That is definitely not a bad idea.

oVirt/RHV actually puts another block/chunk layer on top of the file system, but that's mostly to ensure some distribution of the otherwise monolithic disk files. And then it's also because oVirt/RHV was originall designed for SAN storage.
Ceph by itself is an object storage. All data is stored as an object. How the object is stored physically depends on the OSD type. There used to be the Filestore OSD type, which just stored each object as a file on XFS. Filestore is deprecated by now. Currently, the default is Bluestore which stores the objects directly to a block device, usually an LV.

On top of the direct object store (rados), there are different layers offering different functionality. The Rados Gateway is an S3 compatible API (not useful for Proxmox VE). The CephFS offers a file system and the RBD (rados block device) layer offers block device semantics which hypervisors utilize to store the disk images. All the parts of a disk as well as all the metadata are objects within Ceph. To interact with the disk images directly, you would use the rbd command.

I hope this give you a bit of a high level overview. If you are really interested in how the RBD layer stores data in objects, you might find this blog post useful (shameless self promotion ;) )

Again, I may be rather spoiled by how tolerant Gluster is to single node storage disruptions, but then my motivation to come to Proxmox and Ceph is the lack of any future for Gluster and oVirt now that all downstream commercial products are gone.
Ceph will handle the loss of a disk or a full node gracefully and, given enough resources, will make sure that the data will be back to full redundancy even without the lost node/OSD being back online.

The more resources you give Ceph, the easier it can heal itself -> smaller but more OSDs, more nodes, …

Proxmox VE hyperconverged Ceph clusters are usually on the (very) small scale, compared to other Ceph clusters that large organizations are running.
 
Danke erst mal für den Link, ich kann nur hoffen, daß man mit der RDP Eben unmittelbar nicht in Berührung kommt, denn wenn es darum ging echte Fehler zu beheben, war auch bei Gluster schnell Schuß: Glücklicherweise ging nie etwas kaputt, was nicht durch das ersetzen eines ganzen Bricks einfacher geheilt werden konnte.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!