ZFS suddenly broken, mount issues after being stable for months

skprox

New Member
Oct 11, 2020
4
0
1
42
After a reboot, I noticed several of my LXC containers wouldn't start, after digging in, I noticed that my single ZFS pool wasn't loading. After a lot of forum searching, I've tried to resolve it by adding a few flags (is_mountpoint=1 and mkdir=0 on the items within the mounted zfs pool) in my storage.cfg file, but that's still not fixing the overall issues. It appears that ZFS won't successfully mount due to folders that aren't empty in the mount path. I've had this zfs pool working fine for months... many reboots. No clue what it's so broke now. Here's some Info I hope helps, but please feel free to ask for any other details. I'm really struggling and would appreciate any help.

Code:
root@constellation:/storage/share# journalctl -b | grep zfs
Oct 11 00:22:29 constellation systemd-modules-load[452]: Inserted module 'zfs'
Oct 11 00:22:32 constellation zfs[1167]: cannot mount '/storage/share/iso': directory is not empty
Oct 11 00:22:32 constellation zfs[1167]: cannot mount '/storage/share/downloads': directory is not empty
Oct 11 00:22:32 constellation kernel: zfs[1173]: segfault at 0 ip 00007f78251a0694 sp 00007f78248cb420 error 4 in libc-2.28.so[7f7825146000+148000]
Oct 11 00:22:32 constellation kernel: zfs[1197]: segfault at 0 ip 00007f782527c01c sp 00007f781fff7478 error 4 in libc-2.28.so[7f7825146000+148000]
Oct 11 00:22:32 constellation systemd[1]: zfs-mount.service: Main process exited, code=killed, status=11/SEGV
Oct 11 00:22:32 constellation systemd[1]: zfs-mount.service: Failed with result 'signal'.
Oct 11 00:24:02 constellation CRON[2110]: (root) CMD ([ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ] && /usr/lib/zfs-linux/scrub)
Oct 11 00:24:23 constellation zfs[3694]: cannot mount '/storage/share/iso': directory is not empty
Oct 11 00:24:23 constellation zfs[3694]: cannot mount '/storage/share/downloads': directory is not empty
Oct 11 00:24:23 constellation zfs[3694]: free(): double free detected in tcache 2
Oct 11 00:24:23 constellation systemd[1]: zfs-mount.service: Main process exited, code=killed, status=6/ABRT
Oct 11 00:24:23 constellation systemd[1]: zfs-mount.service: Failed with result 'signal'.

Code:
root@constellation:~# more /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,iso,backup

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

dir: iso
        path /storage/share/iso
        content iso,vztmpl
        shared 0
        mkdir 0
        is_mountpoint 1

zfspool: vmstorage
        pool storage/vmstorage
        content images,rootdir
        mountpoint /storage/vmstorage
        sparse 1
        mkdir 0
        is_mountpoint 1

zfspool: vmstoragelimited
        pool storage/vmstorage/limited
        content images,rootdir
        mountpoint /storage/vmstorage/limited
        sparse 1
        mkdir 0
        is_mountpoint 1

Code:
root@constellation:~# zfs list -r -o name,mountpoint,mounted
NAME                                         MOUNTPOINT                                    MOUNTED
storage                                      /storage                                          yes
storage/share                                /storage/share                                    yes
storage/share/downloads                      /storage/share/downloads                           no
storage/share/iso                            /storage/share/iso                                 no
storage/vmstorage                            /storage/vmstorage                                yes
storage/vmstorage/limited                    /storage/vmstorage/limited                        yes
storage/vmstorage/limited/subvol-101-disk-0  /storage/vmstorage/limited/subvol-101-disk-0       no
storage/vmstorage/limited/subvol-102-disk-0  /storage/vmstorage/limited/subvol-102-disk-0       no
storage/vmstorage/limited/subvol-103-disk-0  /storage/vmstorage/limited/subvol-103-disk-0       no
storage/vmstorage/limited/vm-104-disk-0      -                                                   -

Code:
root@constellation:~# pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-12 (running version: 6.2-12/b287dd27)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 0.9.0-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-1
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.1.0-3
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-15
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2
 
Last edited:
It appears that ZFS won't successfully mount due to folders that aren't empty in the mount path.
That was the issue on my end as well when ZFS didn't mount the dataset.

It might have worked for months on your side but are you really sure you havent upgraded the software on the Proxmox Host?
A reboot triggers a new kernel to be loaded etc. Which could explain the different behavior you see.
 
That was the issue on my end as well when ZFS didn't mount the dataset.

It might have worked for months on your side but are you really sure you havent upgraded the software on the Proxmox Host?
A reboot triggers a new kernel to be loaded etc. Which could explain the different behavior you see.
Yes, it is very possible that an update was applied that upgraded the system. I tend to keep my system up to date on a regular basis, so if anything I was only a minor update, not a major upgrade in versions.

Did you solve your issue? If so, what was the issue and how did you solve it?
 
For me it was no problem to provide an empty / non-existing path. Sorry if that wasn't clear.
Howerever this seems to be no option for you so I am of no help. Sorry :/
 
Yea, so somehow, proxmox is creating directories under /storage/share and /storage/vmstorage/.

For some reason, it didn't do that ever, then all of a sudden... started to do it. I don't know if an update changed something, but the array seems to be fine.

The only real mistake I know I made is I started to do an rm thinking that all the paths were unmounted, but one of them was a mount to a share on my zfs pool, and I did delete some data there (about 20 seconds of rm running before I caught my mistake.). So, I'm going to have to figure out what is lost and restore that stuff from backup.

But at this point, I just want to fix the mapping issue. Even if I have to rebuild my VMs and LXC containers, I don't really care... but I don't know how to overcome the issue to get it back to a stable state.

Thoughts anyone?
 
Yea, so somehow, proxmox is creating directories under /storage/share and /storage/vmstorage/.

For some reason, it didn't do that ever, then all of a sudden... started to do it. I don't know if an update changed something, but the array seems to be fine.
Yes. Proxmox creates the directory structure automatically. If it happens that the mounting is asynchronous with checking (and creating directories if they do not exists yet), you get this problem.
There is a setting to prevent this. Just edit /etc/pve/storage.cfg to make sure Promox does not create the directories if the mounting is slow or if it fails.
 
Yes. Proxmox creates the directory structure automatically. If it happens that the mounting is asynchronous with checking (and creating directories if they do not exists yet), you get this problem.
There is a setting to prevent this. Just edit /etc/pve/storage.cfg to make sure Promox does not create the directories if the mounting is slow or if it fails.

The settings don't work to resolve the problem. I added that setting after getting this issue... but it doesn't appear to solve the problem. See the last 3 entries have the mkdir 0 and is_mountpoint 1 set.

Code:
root@constellation:~# more /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,iso,backup

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

dir: iso
        path /storage/share/iso
        content iso,vztmpl
        shared 0
        mkdir 0
        is_mountpoint 1

zfspool: vmstorage
        pool storage/vmstorage
        content images,rootdir
        mountpoint /storage/vmstorage
        sparse 1
        mkdir 0
        is_mountpoint 1

zfspool: vmstoragelimited
        pool storage/vmstorage/limited
        content images,rootdir
        mountpoint /storage/vmstorage/limited
        sparse 1
        mkdir 0
        is_mountpoint 1
 
Last edited:
Hi,
the target directories for the mount operation need to be empty. For the not yet mounted filesystems, move all files and directories below the mounpoint somewhere else, then zfs mount -a should work.
With the mkdir 0 setting PVE should not create the (e.g. iso) directories anymore, but it won't automatically remove the ones from the previous time.
 
  • Like
Reactions: leesteken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!