Wrong approach to ZFS dataset for VM image storage?

I have an nvme pool that I want to use for
a) storing VMs
b) storing mails (by a mail server run in one of the VMs)

For this purpose, I have created, via command line, a ZFS pool with two datasets, "VMs" and "mail". (mail is obviously shared via nfs with a VM). I have then, in Proxmox GUI, created a "Directory", and have pointed that to the dataset "VMs":

Code:
dir: VMs
        path /mnt/zfs_opt/VMs
        content images
        shared 0

Now, I have since then experienced two strange things:

1. At some point, my VMs had disappeared after a reboot. I copied them into /mnt/zfs_opt/VMs again from backup, and the problem had since not occurred again. So I stopped worrying about it.
2. Later, I was missing the .zfs directory in /mnt/zfs_opt.

After #2 occurred and some searching, I discovered that my dataset "VMs" existed, but that (I think) Proxmox at some point had tried to access the VMs directory before zfs was mounted completedly, and had then, as it automatically does, populated /mnt/zfs_opt/VMs with the folder "images" (because of the Directory that was configured for this folder. Since then, /mnt/zfs_opt/VMs was not empty anymore, so ZFS didn't mount the VMs dataset anymore. I guess this had lost me the VMs at some point. Possibly a race condition or something.

Now, I'm wondering, did I do anything wrong by configuring where to store VMs for Proxmox?

Thanks.
 
You could have added the VM dataset as storage of the type ZFS. In that case, the VM disks would have been ZVOLs. ZVOLs are ZFS datasets that provide a block device instead of a file system and will therefore not show up as directories but can be seen in the output of zfs list.

When adding a directory storage to a path that needs to be mounted by another mechanism than PVE itself, you enter advanced territory. You need to specify the mkdir and is_mountpoint option either via the CLI or by editing the /etc/pve/storage.cfg file.

Code:
    is_mountpoint true
    mkdir 0

The is_mountpoint option will tell PVE to only use it once something is mounted on that path and the mkdir option set to false/0 tells PVE to not create that directory if it does not exists because another mechanism will take care of this.
 
  • Like
Reactions: ThinkAgain
Thanks.

Interestingly, the "VMs" Directory is not shown in the GUI (should it?). So I've tweaked storage.cfg manually as follows (full paste this time):

Code:
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso

zfspool: local-zfs
        pool rpool/data
        content rootdir,images
        sparse 1

zfspool: pool_opt
        pool pool_opt
        content rootdir,images
        mountpoint /pool_opt
        nodes zeus

zfspool: pool_storage
        pool pool_storage
        content images,rootdir
        mountpoint /pool_storage
        nodes zeus

zfspool: pool_sata
        pool pool_sata
        content images,rootdir
        mountpoint /pool_sata
        nodes zeus

dir: VMs
        path /mnt/zfs_opt/VMs
        content images
        shared 0
        is_mountpoint true
        mkdir 0

I will report back, if I should see further issues.
 
What's the output of zfs list?
 
Code:
# zfs list
NAME                         USED  AVAIL     REFER  MOUNTPOINT
pool_opt                     147G   528G      208K  /mnt/zfs_opt
pool_opt/VMs                 108G   528G      102G  /mnt/zfs_opt/VMs
pool_opt/mail               38.7G   528G     38.7G  /mnt/zfs_opt/mail
pool_sata                    104G  3.38T      192K  /mnt/zfs_sata
pool_sata/netstore           104G  3.38T      104G  /mnt/zfs_sata/netstore
pool_storage                12.1T  21.8T      288K  /mnt/zfs_storage
pool_storage/backup         1.12T  21.8T     1.12T  /mnt/zfs_storage/backup
pool_storage/filebase       9.65G  21.8T     9.65G  /mnt/zfs_storage/filebase
pool_storage/jpg             135G  21.8T      135G  /mnt/zfs_storage/jpg
pool_storage/jail           29.9G  21.8T     29.9G  /mnt/zfs_storage/jail
pool_storage/media          10.7T  21.8T     10.7T  /mnt/zfs_storage/media
pool_storage/server64_data  73.9G  21.8T     73.8G  /mnt/zfs_storage/server64_data
rpool                       92.3G   338G      104K  /rpool
rpool/ROOT                  92.3G   338G       96K  /rpool/ROOT
rpool/ROOT/pve-1            92.3G   338G     92.3G  /
rpool/data                    96K   338G       96K  /rpool/data
 
okay, so the dataset exists, let's check if it's mounted: zfs get all pool_opt/VMs

It's possible that a directory with the name VMs was created and now the dataset will not be mounted anymore.
 
okay, so the dataset exists, let's check if it's mounted: zfs get all pool_opt/VMs

It's possible that a directory with the name VMs was created and now the dataset will not be mounted anymore.

Yes, I think that that's what had caused the problems I described in my first mail at the top. After I got it working again (by removing all data from zfs_opt and then recreating the pool and dataset, which was a bit of a challenge, as PVE kept creating the directory too early - but at the end it worked), I wondered what I should have differently to prevent this problem from reoccuring. And your suggestion and explanation above sounded as if it would pretty much nail it on the head!

The dataset is currently mounted:

Code:
# mount | grep VMs
pool_opt/VMs on /mnt/zfs_opt/VMs type zfs (rw,relatime,xattr,noacl)

Code:
# zfs get all pool_opt/VMs
NAME          PROPERTY              VALUE                  SOURCE
pool_opt/VMs  type                  filesystem             -
pool_opt/VMs  creation              Sun Jun 14  0:26 2020  -
pool_opt/VMs  used                  108G                   -
pool_opt/VMs  available             527G                   -
pool_opt/VMs  referenced            102G                   -
pool_opt/VMs  compressratio         1.00x                  -
pool_opt/VMs  mounted               yes                    -
pool_opt/VMs  quota                 none                   default
pool_opt/VMs  reservation           none                   default
pool_opt/VMs  recordsize            128K                   default
pool_opt/VMs  mountpoint            /mnt/zfs_opt/VMs       inherited from pool_opt
pool_opt/VMs  sharenfs              off                    default
pool_opt/VMs  checksum              on                     default
pool_opt/VMs  compression           off                    inherited from pool_opt
pool_opt/VMs  atime                 on                     default
pool_opt/VMs  devices               on                     default
pool_opt/VMs  exec                  on                     default
pool_opt/VMs  setuid                on                     default
pool_opt/VMs  readonly              off                    default
pool_opt/VMs  zoned                 off                    default
pool_opt/VMs  snapdir               hidden                 default
pool_opt/VMs  aclinherit            restricted             default
pool_opt/VMs  createtxg             153749                 -
pool_opt/VMs  canmount              on                     default
pool_opt/VMs  xattr                 sa                     inherited from pool_opt
pool_opt/VMs  copies                1                      default
pool_opt/VMs  version               5                      -
pool_opt/VMs  utf8only              off                    -
pool_opt/VMs  normalization         none                   -
pool_opt/VMs  casesensitivity       sensitive              -
pool_opt/VMs  vscan                 off                    default
pool_opt/VMs  nbmand                off                    default
pool_opt/VMs  sharesmb              off                    default
pool_opt/VMs  refquota              none                   default
pool_opt/VMs  refreservation        none                   default
pool_opt/VMs  guid                  3180716691075449405    -
pool_opt/VMs  primarycache          all                    default
pool_opt/VMs  secondarycache        all                    default
pool_opt/VMs  usedbysnapshots       6.16G                  -
pool_opt/VMs  usedbydataset         102G                   -
pool_opt/VMs  usedbychildren        0B                     -
pool_opt/VMs  usedbyrefreservation  0B                     -
pool_opt/VMs  logbias               latency                default
pool_opt/VMs  objsetid              1293                   -
pool_opt/VMs  dedup                 off                    default
pool_opt/VMs  mlslabel              none                   default
pool_opt/VMs  sync                  standard               default
pool_opt/VMs  dnodesize             legacy                 default
pool_opt/VMs  refcompressratio      1.00x                  -
pool_opt/VMs  written               6.65G                  -
pool_opt/VMs  logicalused           108G                   -
pool_opt/VMs  logicalreferenced     102G                   -
pool_opt/VMs  volmode               default                default
pool_opt/VMs  filesystem_limit      none                   default
pool_opt/VMs  snapshot_limit        none                   default
pool_opt/VMs  filesystem_count      none                   default
pool_opt/VMs  snapshot_count        none                   default
pool_opt/VMs  snapdev               hidden                 default
pool_opt/VMs  acltype               off                    default
pool_opt/VMs  context               none                   default
pool_opt/VMs  fscontext             none                   default
pool_opt/VMs  defcontext            none                   default
pool_opt/VMs  rootcontext           none                   default
pool_opt/VMs  relatime              on                     inherited from pool_opt
pool_opt/VMs  redundant_metadata    all                    default
pool_opt/VMs  overlay               off                    default
pool_opt/VMs  encryption            off                    default
pool_opt/VMs  keylocation           none                   default
pool_opt/VMs  keyformat             none                   default
pool_opt/VMs  pbkdf2iters           0                      default
pool_opt/VMs  special_small_blocks  0                      default

In the GUI, if I select the node zeus in the left colum and then disks -> directory, nothing is shown. The directory does show up in the GUI, though, under Datacenter -> Storage. But this appears to be a cosmetic thing, only - or maybe it's even meant to be this way. As long as the pool gets mounted and the VMs are there, I'm happy!
 
Hi,

When you add any resurce in DataCenter, be aware thay you have a check box, so you can setup this resurce (folder, dataset, whatever) for only some or only one of your node. This resurces must be already available for all nodes if you want to see on all nodes.

Good luck / Bafta
 
Admittedly, it's been a while. After the PME just ran in the meantime with no need to reboot, I just tested out the above suggested changes to storage.cfg. What I found out is that the addition of

Code:
is_mountpoint true

leads to the respective ZFS directory (/mnt/zfs_opt/VMs) not being mounted anymore. Proxmox boots up, but does not mount this directory from ZFS - which then, because it contains all the VMs, leads to nothing being started and nothing working. Syslog shows:

Code:
Jul  2 19:55:34 zeus pvestatd[6361]: unable to activate storage 'VMs' - directory is expected to be a mount point but is not mounted: '/mnt/zfs_opt/VMs'

If I remove the above mentioned line, Proxmox boots up with the ZFS mount. What did I miss?

Full storage.cfg as suggested and not working:

Code:
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso

zfspool: local-zfs
        pool rpool/data
        content rootdir,images
        sparse 1

zfspool: pool_opt
        pool pool_opt
        content rootdir,images
        mountpoint /pool_opt
        nodes zeus

zfspool: pool_storage
        pool pool_storage
        content images,rootdir
        mountpoint /pool_storage
        nodes zeus

zfspool: pool_sata
        pool pool_sata
        content images,rootdir
        mountpoint /pool_sata
        nodes zeus

dir: VMs
        path /mnt/zfs_opt/VMs
        content images
        shared 0
        is_mountpoint true
        mkdir 0

Could it be that "is_mountpoint" needs to be no, if it in fact is a mountpoint, but only for the built-in ZFS? On the other hand, my original problem (see above) was some kind of race condition, because Proxmox put data into the mountpoint, before the ZFS dataset was mounted there...

How do I get this right, please?
 
I just had this problem again: When I rebooted the server (which I fortunately I don't do too often), the VMs dataset was not mounted, and pve thus didn't start any VMs. The issue was really only that the dataset was not mounted, and could be fixed by issuing
zfs mount pool_opt/VMs
I don't understand, though, why this is happening from time to time... There must be some kind of race condition somewhere...

I could actually fix this by running, at boot, a script that checks if pool_opt/VMs is mounted, and if not, remounts it. But that's more like a workaround than a fix, no...?
 
Hi @ThinkAgain

If you are still having this issue, you could try looking into making sure that the zfs service is started before the pve service in systemd.

I remember doing this when I installed PVE as I was reccomented somewhere but I don't have the time to find that post right now. Try to search around a little...
 
Thanks, it seems that there is a bit of a race condition somewhere with the ZFS mounts. Not sure if this is really about the pve service (which doesn't require zfs to be started first, that's true) or something else... In any case, my quick fix for the time being is a script that I'm running after boot to make sure all ZFS resources are mounted. If something should not be mounted, zfs mounts are mounted specifically, shared again, and then the nfs server is restarted. This cascade (all of these steps cumulatively) helps and so far has worked well for me.

And as it's working now, I'm hesitant to apply additional fixes...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!