Shared isn't Shared

drjaymz@

Member
Jan 19, 2022
124
5
23
102
I have 3 nodes joined together and I have imported some legacy KVM virtual machines. These require kernel boot images which I have configured in the conf file as an argument.

At the moment I have put those files on an NFS share that all nodes can see so they have the same path on all nodes. Thus if HA kicks in the next node can find the files. But I don't want this dependency on the NFS server which isn't HA.

What I want is a shared folder in which I can drop files and they are replicated. I set the local fs option [shared] which sounds like what you need but if you touch a file it doesn't replicate.

I know that I can just create a folder on all nodes and create a cron to do it but I was certain there would be an better way such that if I join with a new node it all just works without creating a spaghetti mess everywhere.

I searched for [shared] option on local and that was inconclusive, all shared stuff seems to point to mounting external server which I don't want and if I include the search term replication either google wants to sell me something or its just talking about VM replication.
There is a machine config area that gets replicated but thats limited to 30Mb RAM and I need a bit more than that.

Any help greatly appreciated.
 

So I know it has a clusterfs already on it for the machine configs etc. I looked at adding storage replication but seemed like a lot of unnecessary faff, so I may as well create a directory and a cron to rsync. I just thought there would be an out of the box way. I really thought the shared option on storage was the answer; since its not what exactly DOES the shared checkbox on local storage actually do?
 
So I know it has a clusterfs already on it for the machine configs etc. I looked at adding storage replication but seemed like a lot of unnecessary faff, so I may as well create a directory and a cron to rsync. I just thought there would be an out of the box way. I really thought the shared option on storage was the answer; since its not what exactly DOES the shared checkbox on local storage actually do?
I looked through that link but there must be something wrong with me. I do NOT understand how it does what I asked for. It only seems to know how to replicate its own files. It has the restriction of container, iso etc.

What I want is a mount point that replicates between sites and has a path that I can reference in the args for a machine config.

Because what I want seems to be so difficult then I guess its not a common requirement.
 
What I want is a mount point that replicates between sites and has a path that I can reference in the args for a machine config.
Out of the box with a standard PVE installation? There is NO magically shared storage.

As you already found out there is shared configuration data, stored in a database and accessible mounted by something called "fuse" in /etc/pve. This area is small. It is not capable to contain complete VMs or Containers. Do not touch it without exactly knowing what you do.

Because what I want seems to be so difficult then I guess its not a common requirement.
Well.., it is a common request. A cluster of several nodes with only local storage has several drawbacks. You know the usual solutions already, often NFS or Ceph or Gluster is used, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_storage_types for officially supported "shared" systems.

And yes, to implement this kind of storage requires some effort and it has lot of it own problems to run e.g. a HA-NFS system as a cluster of its own.

For this reason a "small" solution may be to use only local storage, use ZFS and run frequent automatically (every few minutes or daily) replication jobs between the nodes. That's what I do.

Pro: I do not have to build/maintain a "real" network attached storage system with redundancy and fast, redundant network.

Con: on node failure I loose the data since the last replication. (Controlled migration between nodes does not have this problem of course.)

Have fun!
 
  • Like
Reactions: bbgeek17 and LnxBil
Out of the box with a standard PVE installation? There is NO magically shared storage.

As you already found out there is shared configuration data, stored in a database and accessible mounted by something called "fuse" in /etc/pve. This area is small. It is not capable to contain complete VMs or Containers. Do not touch it without exactly knowing what you do.


Well.., it is a common request. A cluster of several nodes with only local storage has several drawbacks. You know the usual solutions already, often NFS or Ceph or Gluster is used, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_storage_types for officially supported "shared" systems.

And yes, to implement this kind of storage requires some effort and it has lot of it own problems to run e.g. a HA-NFS system as a cluster of its own.

For this reason a "small" solution may be to use only local storage, use ZFS and run frequent automatically (every few minutes or daily) replication jobs between the nodes. That's what I do.

Pro: I do not have to build/maintain a "real" network attached storage system with redundancy and fast, redundant network.

Con: on node failure I loose the data since the last replication. (Controlled migration between nodes does not have this problem of course.)

Have fun!
Thanks, I only needed to store the kernel boot files and initial ram disk which are small but beyond the shared config. I come from proper engineering background, so I went for the simplest solution; create a directory which rsyncs to others using cron, rather than faffing about adding layers of file systems and other replication stuff that would require extra setup on every node. That way, I only need make that folder and add it to a config when I add a new node, I can then put anything I like in there and it will be on the same path wherever I go.

It's non-ideal for several reasons, if the "primary" node was offline then it wouldn't replicate - but I figured the boot files are unlikely to change often and if they did I should know I dunnit. Secondly, I strongly resist customising the PVE installation too much with configs and folder machine gun fired all over the place because it makes life difficult to add nodes or replicate the setup. Because its an evolution , things get created, changed, removed, changed again and no matter how carefully documented, to get from a bare metal setup to the same config is tedious, I don't have the patience and setting others up for failure who might need to maintain things.
 
PS: what the heck does the shared option do anyway? One would think it meant shared, rather than the other meaning of shared which only PVE knows about meaning its not shared.
 
PS: what the heck does the shared option do anyway? One would think it meant shared, rather than the other meaning of shared which only PVE knows about meaning its not shared.
it means shared, but as in "this storage has the same contents on all nodes, you don't need to transfer it when migrating guests/..". it *marks* the storage as shared (by other, external means, like by being stored on a SAN/NAS), it doesn't *make* it shared.
 
it means shared, but as in "this storage has the same contents on all nodes, you don't need to transfer it when migrating guests/..". it *marks* the storage as shared (by other, external means, like by being stored on a SAN/NAS), it doesn't *make* it shared.
That makes sense if it said "Marked as shared". It also means that the average search result for the entire internet regarding proxmox and shared is incorrect. If you look at the storage documentation its not actually clear. Now I look at it again it could be clearer especially if we're using standard terms for things that mean something different. Shared means "available to others", not "A note that means it could be available to other if you set that up some other way but just because this flag is set doesn't mean its actually shared" - although I appreciate that might not fit on the dialog box.
 
The description of "shared" in the docs is "Mark storage as shared." (not "Make storage shared"). In the storage plugin table only network storages are marked as "Shared: Yes", with a foot note explaining that if you put LVM on top of iSCSI you can also have a shared LVM (because of the iSCSI part). The directory storage type is marked as "Shared: No", setting it to shared is only an escape hatch for things like customized NFS/CIFS/CephFS/.. mounts that are not covered by our built-in storage plugins, or things like OCSF2.

You set that attribute of the storage configuration, you are not setting up a storage at that point (where "Shared" would mean "this directory should be shared by PVE"), you are making it available to PVE (where "Shared" means "this storage is a 'shared storage', treat it like one").

It's the same with other storage attributes:
- add NFS server with host and export (it doesn't export something using that host, it tells PVE that there is such an NFS export by that host)
- add RBD storage (you fill out pool and monitors, it doesn't create them)
- add ZFS storage (you fill out pool/dataset, it doesn't create it)
- ...

Note how there is no option "Shared" when actually setting up a storage over the GUI (Node -> Disks -> Directory -> Create: Directory) ;)

If you have a concrete proposal of a better word that encapsulates "this storage has the same content everywhere", please tell us.

https://www.hpe.com/emea_europe/en/what-is/shared-storage.html (first hit on google for me for "shared storage") also clearly associates the term with external, shared storage solutions (like a NAS/SAN or other storage server), same as where the term comes from in PVE.

I guess it's a testament to the ease of using PVE that (some) people sometimes think ticking a simple check box could automatically (magically? ;)) set up some sort of distributed storage behind the scenes. I really wish that would be an option ;)
 
It's non-ideal for several reasons, if the "primary" node was offline then it wouldn't replicate - but I figured the boot files are unlikely to change often and if they did I should know I dunnit.

What I do when I have a need similar to yours is to have a VM acting as "replication primary" where content is updated and then rsync'ed to the app servers, which would be PVE hosts in your case. No need for NFS, no need for a primary, no dependencies on external services/network, content is backed up as a VM and there's a high chance to provide HA to that VM either by proper shared storage, ZFS storage replication or even restoring a backup of the VM if things go too bad.

Regading the installation it would be just apt install rsync + mkdir -p /some/dir + an /etc/rsyncd.conf with something like:

Code:
[KvmBootFiles]
  path = /some/dir
  comment = Legacy VM boot files
  read only = 0
  uid = root
  gid = root
  hosts allow = replication_primary_VM_IP
  use chroot = 1

oh, and systemctl enable --now rsyncd.service
 
The description of "shared" in the docs is "Mark storage as shared." (not "Make storage shared"). ;)

There's room for improvement here... The help page should be more verbose in the meaning of that option and letting users know that "shared" means something like this would work for me:

Mark storage as shared. This does not make content automatically shared among PVE hosts. Content must be replicated or shared either by some user-provided mechanism (unison, rsync, manual copy), by the storage infrastructure (iSCSI+LVM) or by a cluster filesystem (OCFS2, GFS2).
 
  • Like
Reactions: UdoB
That would be wrong. A shared storage needs to "instantly" have the same content, async replication like rsync/.. is not enough.
 
Understood: "replicated" isn't the same as "shared", even if async replication may be enough for some use cases.

Having that box checked or not will change the behavior of PVE during a migration, so it copies or not the VMs disks. The docs could also mention that. Does the "shared" checkbox change the behaviour of other PVE operations?
 
yes, it also affects other inter-node operations like cross-node cloning (which is currently only allowed for guests fully on shared storages).
 
I guess it's a testament to the ease of using PVE that (some) people sometimes think ticking a simple check box could automatically (magically? ;)) set up some sort of distributed storage behind the scenes. I really wish that would be an option ;)

There is no reason at all you couldn't have shared storage with a single click at the data centre level, everything you need is already there, all you need to do is decide if its a common requirement or not. I thought perhaps it wasn't but people are saying it is, but its not for me to decide.

It's not the user's naivety that's the problem it's because it already does distributed storage behind the scenes with a simple tick box and it's all about magic and HA and automagically migrating stuff from one node to another - even without stopping it. Isn't it?
 
There is no reason at all you couldn't have shared storage with a single click at the data centre level, everything you need is already there, all you need to do is decide if its a common requirement or not. I thought perhaps it wasn't but people are saying it is, but its not for me to decide.

It's not the user's naivety that's the problem it's because it already does distributed storage behind the scenes with a simple tick box and it's all about magic and HA and automagically migrating stuff from one node to another - even without stopping it. Isn't it?

you severely underestimate the complexity of distributed storage. there's a big difference between having what basically amounts to a distributed key-values store with all sorts of constraints (e.g., when are changes visible, how much data can you store, how expensive are writes/synchronizing), which is what /etc/pve / pmxcfs is, versus actually storing vast amounts of data for guest volume usage, with good performance and usable consistency and redundancy guarantees.

if it were that easy, we'd have a magic "make shared" checkbox already :) we've made progress in that area over the last years, e.g. with the fully integrated hyperconverged ceph setup - but that requires empty disks, not just arbitrary file system paths..