Mount external CephFS - no fsname in /etc/pve/storage.cfg

tidnab · May 26, 2023

Hi

I'm trying to add external cephfs storage directly in /etc/pve/storage.cfg because I need to specify "subdir". (It is not possible to type subdir in the GUI)
But I also have to specify 'FS name' because it's not the default cephfs file system. Can I add 'FS name' to etc/pve/storage.cfg and how?

aaron · May 26, 2023

Code:

cephfs: cephfs
    path /mnt/pve/cephfs
    content backup,vztmpl,iso
    fs-name cephfs

fs-name is the parameter that you most likely need

tidnab · May 26, 2023

Thank you very much. It would be nice to have this information in the documentation
https://pve.proxmox.com/wiki/Storage:_CephFS

aaron · May 26, 2023

tidnab said:
Thank you very much. It would be nice to have this information in the documentation
https://pve.proxmox.com/wiki/Storage:_CephFS

thanks for the hint. I sent a quick docu patch.

abufrejoval · Jun 28, 2023

aaron said:
Code:

cephfs: cephfs path /mnt/pve/cephfs content backup,vztmpl,iso fs-name cephfs

fs-name is the parameter that you most likely need

So if I were to add 'images' to the content clause, could I then also store VMs on CephFS?

Because I really don't understand why VMs aren't supported on CephFS, or what the disadvantages of that would be...

P.S. I've been using oVirt HCI for the last years, tried Xcp-ng and am still mostly confused

aaron · Jun 28, 2023

abufrejoval said:
Because I really don't understand why VMs aren't supported on CephFS, or what the disadvantages of that would be...

That won't work, as we don't allow it.

Create a new Ceph pool and if the "Add Storage" checkbox is enabled, a matching storage will be added to the Proxmox VE config. It will be of the type RBD (rados block device), the block device functionality on top of Ceph.

It is designed with VMs in mind. CephFS has a few things that make it unsuitable for VM storage. One, storing QCOW2 files for example add another layer that is not needed.
The major one though is that if an MDS (metadata server, providing the FS functionality) fails and a standby MDS needs to take over, it can take a bit until the CephFS is available again. On large ones it might even take a few minutes. Not something that can be used for VM images

abufrejoval · Jun 28, 2023

aaron said:
That won't work, as we don't allow it.

On its own, not a nice argument

aaron said:
Create a new Ceph pool and if the "Add Storage" checkbox is enabled, a matching storage will be added to the Proxmox VE config. It will be of the type RBD (rados block device), the block device functionality on top of Ceph.

That's what I had done, too, it was mostly in the process of testing the import of VMs from other hypervisors that I felt a file system might save some copying around, especially with 'huge' disks that are actually quite sparse (I am so used to adding TB disks to VM and then not use them, relying on sparsity and trimming to keep them small).

aaron said:
It is designed with VMs in mind. CephFS has a few things that make it unsuitable for VM storage. One, storing QCOW2 files for example add another layer that is not needed.

That could have made it into your excellent documentation (perhaps with more data), because it's not as obvious to me as an RHV/oVirt GlusterFS user: GlusterFS has not gained fame as a speed devil, but unless you're talking Infiniband, another layer without a kernel/userland transition doesn't sound that expensive.
oVirt/RHV actually puts another block/chunk layer on top of the file system, but that's mostly to ensure some distribution of the otherwise monolithic disk files. And then it's also because oVirt/RHV was originall designed for SAN storage.

aaron said:
The major one though is that if an MDS (metadata server, providing the FS functionality) fails and a standby MDS needs to take over, it can take a bit until the CephFS is available again. On large ones it might even take a few minutes. Not something that can be used for VM images

On one hand that's another welcome insight, on the other 'minutes' certainly sounds disastrous in a storage context.

So does putting the node which runs the active MDS into maintenance transfer that role to a standby MDS without such an expensive arbitration? Does starting a standby server reduce the currently active into a standby? (I guess I should start reading the Ceph documentation...)

Again, I may be rather spoiled by how tolerant Gluster is to single node storage disruptions, but then my motivation to come to Proxmox and Ceph is the lack of any future for Gluster and oVirt now that all downstream commercial products are gone.

aaron · Jul 5, 2023

abufrejoval said:
I guess I should start reading the Ceph documentation...

That is definitely not a bad idea.

abufrejoval said:
oVirt/RHV actually puts another block/chunk layer on top of the file system, but that's mostly to ensure some distribution of the otherwise monolithic disk files. And then it's also because oVirt/RHV was originall designed for SAN storage.

Ceph by itself is an object storage. All data is stored as an object. How the object is stored physically depends on the OSD type. There used to be the Filestore OSD type, which just stored each object as a file on XFS. Filestore is deprecated by now. Currently, the default is Bluestore which stores the objects directly to a block device, usually an LV.

On top of the direct object store (rados), there are different layers offering different functionality. The Rados Gateway is an S3 compatible API (not useful for Proxmox VE). The CephFS offers a file system and the RBD (rados block device) layer offers block device semantics which hypervisors utilize to store the disk images. All the parts of a disk as well as all the metadata are objects within Ceph. To interact with the disk images directly, you would use the rbd command.

I hope this give you a bit of a high level overview. If you are really interested in how the RBD layer stores data in objects, you might find this blog post useful (shameless self promotion

)

abufrejoval said:
Again, I may be rather spoiled by how tolerant Gluster is to single node storage disruptions, but then my motivation to come to Proxmox and Ceph is the lack of any future for Gluster and oVirt now that all downstream commercial products are gone.

Ceph will handle the loss of a disk or a full node gracefully and, given enough resources, will make sure that the data will be back to full redundancy even without the lost node/OSD being back online.

The more resources you give Ceph, the easier it can heal itself -> smaller but more OSDs, more nodes, …

Proxmox VE hyperconverged Ceph clusters are usually on the (very) small scale, compared to other Ceph clusters that large organizations are running.

abufrejoval · Jul 17, 2023

Danke erst mal für den Link, ich kann nur hoffen, daß man mit der RDP Eben unmittelbar nicht in Berührung kommt, denn wenn es darum ging echte Fehler zu beheben, war auch bei Gluster schnell Schuß: Glücklicherweise ging nie etwas kaputt, was nicht durch das ersetzen eines ganzen Bricks einfacher geheilt werden konnte.

Search

Search

Mount external CephFS - no fsname in /etc/pve/storage.cfg

tidnab

New Member

aaron

Proxmox Staff Member

tidnab

New Member

aaron

Proxmox Staff Member

abufrejoval

Member

aaron

Proxmox Staff Member

abufrejoval

Member

aaron

Proxmox Staff Member

abufrejoval

Member

We value your privacy