Encrypted ZFS datasets empty after manual mount

Athlon

New Member
Dec 31, 2024
2
1
3
Hello!
I am struggling with a quite weird problem imho.
Running Proxmox 7.4.1 (without subscription) without any issues for a long time until recently the SATA controller card locked up and I had to do a hard shutdown. Connected the 4 harddrives to the internal ports and booted up.
The pool "ZFS_2" is encrypted and gets mounted using zfs mount -l ZFS_2 after logging in via SSH:
Code:
zfs get mounted
NAME                          PROPERTY  VALUE    SOURCE
ZFS_2                         mounted   yes      -
ZFS_2/Backups                 mounted   yes      -
ZFS_2/Multimedia              mounted   yes      -
However, df -h shows an incorrect size for the dataset "Backups" and Multimedia and the mountpoints /mnt/ZFS_2/Backups & /mnt/ZFS_2/Multimedia appear to be empty:
Code:
Filesystem                    Size  Used Avail Use% Mounted on
...
ZFS_2                          46T   23T   24T  50% /mnt/ZFS_2
ZFS_2/Backups                  24T  256K   24T   1% /mnt/ZFS_2/Backups
ZFS_2/Multimedia               24T  256K   24T   1% /mnt/ZFS_2/Multimedia

The zfs get canmount command is set:
Code:
NAME                          PROPERTY  VALUE     SOURCE
ZFS_2                         canmount  on        default
ZFS_2/Backups                 canmount  on        default
ZFS_2/Multimedia              canmount  on        default

As it turns out, for a brief moment, after manually unmounting using zfs unmount ZFS_2/Backups and immediately listing the mountpoint's content using ls /mnt/ZFS_2/Backups/ the directory structure including files and subdirectories is visible. After two seconds the command ls /mnt/ZFS_2/Backups/ returns an empty mountpoint again - apparently the zfs mount command has performed another mount. This is also true for the Multimedia directory. I read up a lot about "double mounting", "empty mount directories" etc. but did not find a permanent solution yet unless i set the canmount=off for the pool "ZFS_2" and its datasets.
I suspect some misconfiguration in my proxmox setup (Cache?) or on the pool "ZFS_2" since my other pool "ZFS" never showed this behaviour. Any help would be greatly appreciated!

###################################################################
Additional information:
SMART tests using smartctl -t long for all four drives finished without error.
The command zpool status shows no errors:
Code:
pool: ZFS_2
 state: ONLINE
  scan: scrub repaired 0B in 21:13:41 with 0 errors on Mon Dec 23 16:51:42 2024
config:

        NAME                                   STATE     READ WRITE CKSUM
        ZFS_2                                  ONLINE       0     0     0
          raidz1-0                             ONLINE       0     0     0
            ata-ST18000NM000J-2TV103           ONLINE       0     0     0
            ata-ST18000NM000J-2TV103           ONLINE       0     0     0
            ata-ST18000NM000J-2TV103           ONLINE       0     0     0
            ata-ST18000NM000J-2TV103           ONLINE       0     0     0

errors: No known data errors
Code:
zpool --version
zfs-2.1.15-pve1
zfs-kmod-2.1.15-pve1
 
That's the zfs pod lid special feature to even not show data if still encrypted yet. Use canmount off with systemd unit for mount after key is given.
 
Thanks waltar for you reply on New Year's Eve! I have been working on your suggestion in the mean time and also ordered an HBA running in IT mode to get rid of a possible hardware issue with the cheap S-ATA card. This is the current status:

As you suggested, I set canmount=noauto for the ZFS_2 pool and its datasets. I can access the data after unlocking with a passphase once after boot but there are no mount points visible when running df (as expected).
After installing the new HBA, no change in behaviour occured. I rsynced all files from the ZFS pool to the ZFS_2 pool for a start. Then I destroyed the ZFS pool which was able to automatically provide mountpoints with data after unlocking with a passphrase to check if the behaviour appears on a new pool (identical name though).
This are the steps I performed today:

1. zfs destroy ZFS
2. Wiped the four disks using the GUI
3. Created a new pool called ZFS again:
Code:
zpool create -o ashift=12 -O compression=lz4 -O acltype=posix \
  -O xattr=sa -O dnodesize=auto -O atime=off -O encryption=aes-256-gcm \
  -O keylocation=prompt -O keyformat=passphrase -m /mnt/ZFS \
  ZFS raidz1 /dev/disk/by-id/ata-ST18000NM000J-2TV103_ZR52XXXX \
             /dev/disk/by-id/ata-ST18000NM000J-2TV103_ZR52XXXX \
             /dev/disk/by-id/ata-ST18000NM000J-2TV103_ZR52XXXX \
             /dev/disk/by-id/ata-ST18000NM000J-2TV103_ZR52XXXX
4. Added the pool to be visible in the GUI: pvesm add zfspool ZFS -pool ZFS
5. Added the datasets to be visible in the GUI:
Code:
zfs create ZFS/Backups
pvesm add zfspool Backups -pool ZFS/Backups
zfs create ZFS/Multimedia
pvesm add zfspool Multimedia -pool ZFS/Multimedia
6. Rebooted and unlocked the pool and its dataset using zfs mount -l ZFS. The command df -h immediately lists the mountpoints and different directory sizes:
Code:
Filesystem                    Size  Used Avail Use% Mounted on
ZFS                            44T  512K   44T   1% /mnt/ZFS
ZFS/Backups                    45T  864G   44T   2% /mnt/ZFS/Backups
ZFS/Multimedia                 45T  266G   44T   1% /mnt/ZFS/Multimedia

I verified the canmount property:
Code:
NAME                          PROPERTY  VALUE     SOURCE
ZFS                           canmount  on        default
ZFS/Backups                   canmount  on        default
ZFS/Multimedia                canmount  on        default

I wonder if the ZFS_2 pool somehow got "corrupted" regarding the mount configuration? Unlocking it during the mount process worked in the past for this pool and now (again) for the ZFS pool. Currently I am rsyncing all my data back from ZFS_2 to ZFS.
 
  • Like
Reactions: waltar