[SOLVED] ZFS pool data loss after power outage

frozencreek

New Member
Feb 20, 2022
4
0
1
35
I'm still fairly new to running a proxmox server and just hit a major issue. Recently my proxmox server suffered a sudden power loss and when it came back online, all data in a directory was lost. Strangely, not all directories were affected because my VMs and containers were unaffected and booted fine, but backups, ISO images, and other data were lost. I have a 3.63TiB zpool, but the GUI is reporting about 1TiB available and ~600GiB used (after backing up VMs and containers).

MountainPool is the zpool containing (2) 1.8TiB drives and I created the filesystem MountainPool/share (backups, music, ISOs, etc) and MountainPool/vmstorage (VMs and Containers).
Bash:
zpool create -f -o ashift=12 MountainPool /dev/sdb /dev/sdc cache /dev/sda5 log /dev/sda4
zfs create MountainPool/share
zfs create MountainPool/share/iso
zfs create MountainPool/vmstorage

mkdir /MountainPool/share/Backups
mkdir /MountainPool/share/Music

Bash:
# zpool status
  pool: MountainPool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 02:54:18 with 0 errors on Sun Feb 13 03:18:19 2022
config:

        NAME                            STATE     READ WRITE CKSUM
        MountainPool                    ONLINE       0     0     0
          wwn-0x5000c50064d9de46        ONLINE       0     0     0
          wwn-0x5000c5007a38e0eb        ONLINE       0     0     0
        logs
          wwn-0x5001b44ec37ed49e-part4  ONLINE       0     0     0
        cache
          wwn-0x5001b44ec37ed49e-part5  ONLINE       0     0     0

errors: No known data errors

Nothing was lost from MountainPool/vmstorage, but MountainPool/share data was lost and I'm getting weird usage data output. Somehow most of the data is in the MountainPool/share zfs directory (2.58T), but I only see empty folders and no data.

Bash:
# zfs list -ro space -t all MountainPool

NAME                                                 AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
MountainPool                                          365G  3.16T        0B    558G             0B      2.61T
MountainPool/share                                    365G  2.58T        0B   2.58T             0B      2.15G
MountainPool/share/iso                                365G  2.15G        0B   2.15G             0B         0B
MountainPool/share/subvol-100-disk-0                  365G   112K        0B    112K             0B         0B
MountainPool/vmstorage                                365G  33.5G        0B    104K             0B      33.5G
MountainPool/vmstorage/containers                     365G  7.77G        0B    104K             0B      7.77G
MountainPool/vmstorage/containers/subvol-100-disk-0  7.18G   835M        0B    835M             0B         0B
MountainPool/vmstorage/containers/subvol-102-disk-0  5.32G  2.68G        0B   2.68G             0B         0B
MountainPool/vmstorage/containers/subvol-103-disk-0  7.04G   978M        0B    978M             0B         0B
MountainPool/vmstorage/containers/subvol-105-disk-0  4.68G  3.32G        0B   3.32G             0B         0B
MountainPool/vmstorage/subvol-100-disk-0             8.00G   120K        0B    120K             0B         0B
MountainPool/vmstorage/vm-101-disk-0                  365G  13.4G        0B   13.4G             0B         0B
MountainPool/vmstorage/vm-104-disk-0                  365G  12.3G        0B   12.3G             0B         0B

# ls -la /MountainPool/share/iso
total 12
drwxr-xr-x 3 root root 4096 Feb 20 08:18 .
drwxr-xr-x 8 root root 4096 Feb 20 08:18 ..
drwxr-xr-x 4 root root 4096 Feb 20 08:18 template

# ls -la /MountainPool/share/iso/template/
total 16
drwxr-xr-x 4 root root 4096 Feb 20 08:18 .
drwxr-xr-x 3 root root 4096 Feb 20 08:18 ..
drwxr-xr-x 2 root root 4096 Feb 20 08:18 cache
drwxr-xr-x 2 root root 4096 Feb 20 08:18 iso

# ls -la /MountainPool/share/iso/template/cache/
total 8
drwxr-xr-x 2 root root 4096 Feb 20 08:18 .
drwxr-xr-x 4 root root 4096 Feb 20 08:18 ..

# ls -la /MountainPool/share/iso/template/iso/
total 8
drwxr-xr-x 2 root root 4096 Feb 20 08:18 .
drwxr-xr-x 4 root root 4096 Feb 20 08:18 ..

/MountainPool/share# du -sh *
8.0K    Backups
4.0K    dump
4.0K    images
16K     iso
4.0K    private
12K     template

Is there any way to recover the lost data?
 
First I would try to initialize a scrub zpool scrub MountainPool so ZFS can search for corruped data and repair itself. If you really lost data ZFS then should complain about checksum errors and can tell you what files are lost.

Its also always a good idea to get a UPS so such things can't happen. You get these new and retail as low as 50€.
 
Last edited:
Agree, scrub is good practice! Regularly!
Aside that I think It is likely a mounting issue.
I have been there in the past as well, wetting my pants... ;)
In the end in my situation the zvol/dataset was simply not set to mount automatically.
Once i realised this it was an easy fix.
 
  • Like
Reactions: bobmc
Agree, scrub is good practice! Regularly!
Aside that I think It is likely a mounting issue.
I have been there in the past as well, wetting my pants... ;)
In the end in my situation the zvol/dataset was simply not set to mount automatically.
Once i realised this it was an easy fix.
Good point. You can check that with zfs get mounted.
 
  • Like
Reactions: bobmc
I ran the scrub zpool scrub MountainPool and completed after about 4hrs without any errors.

Bash:
# zpool status
  pool: MountainPool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 03:29:23 with 0 errors on Sun Feb 20 18:42:32 2022
config:

        NAME                            STATE     READ WRITE CKSUM
        MountainPool                    ONLINE       0     0     0
          wwn-0x5000c50064d9de46        ONLINE       0     0     0
          wwn-0x5000c5007a38e0eb        ONLINE       0     0     0
        logs
          wwn-0x5001b44ec37ed49e-part4  ONLINE       0     0     0
        cache
          wwn-0x5001b44ec37ed49e-part5  ONLINE       0     0     0

errors: No known data errors

I also ran zfs get mounted. Looks like only the running containers are mounted and not the full MountainPool.

Bash:
# zfs get mounted
NAME                                                 PROPERTY  VALUE    SOURCE
MountainPool                                         mounted   no       -
MountainPool/share                                   mounted   no       -
MountainPool/share/iso                               mounted   no       -
MountainPool/share/subvol-100-disk-0                 mounted   no       -
MountainPool/vmstorage                               mounted   no       -
MountainPool/vmstorage/containers                    mounted   no       -
MountainPool/vmstorage/containers/subvol-100-disk-0  mounted   yes      -
MountainPool/vmstorage/containers/subvol-102-disk-0  mounted   yes      -
MountainPool/vmstorage/containers/subvol-103-disk-0  mounted   yes      -
MountainPool/vmstorage/containers/subvol-105-disk-0  mounted   yes      -
MountainPool/vmstorage/subvol-100-disk-0             mounted   no       -
MountainPool/vmstorage/vm-101-disk-0                 mounted   -        -
MountainPool/vmstorage/vm-104-disk-0                 mounted   -        -

I ran zfs mount MountainPool, but it wouldn't mount because the filesystem wasn't empty.
Bash:
# zfs mount MountainPool
cannot mount '/MountainPool': directory is not empty

After some digging I ran the mount with the -O flag and this seemed to work! I'm finally seeing the missing data!
Bash:
# zfs mount -O MountainPool
# zfs get mounted
NAME                                                 PROPERTY  VALUE    SOURCE
MountainPool                                         mounted   yes      -
MountainPool/share                                   mounted   yes      -
MountainPool/share/iso                               mounted   yes      -
MountainPool/share/subvol-100-disk-0                 mounted   no       -
MountainPool/vmstorage                               mounted   yes      -
MountainPool/vmstorage/containers                    mounted   yes      -
MountainPool/vmstorage/containers/subvol-100-disk-0  mounted   yes      -
MountainPool/vmstorage/containers/subvol-102-disk-0  mounted   yes      -
MountainPool/vmstorage/containers/subvol-103-disk-0  mounted   yes      -
MountainPool/vmstorage/containers/subvol-105-disk-0  mounted   yes      -
MountainPool/vmstorage/subvol-100-disk-0             mounted   no       -
MountainPool/vmstorage/vm-101-disk-0                 mounted   -        -
MountainPool/vmstorage/vm-104-disk-0                 mounted   -        -

Thank you both for your help!
 
In case you added those datasets as a "Directory" storage you need to set the "is_mountpoint" option (pvesm set YourStorageID --is_mountpoint yes) or you run into problems like your "cannot mount '/MountainPool': directory is not empty".
 
I had rebooted and realized they didn't mount automatically when the server came back up. I set the is_mountpoint option to my storageIDs that were setup as Directories.

Does that option get added to the /etc/pve/storage.cfg file?

journalctl | grep mount returns

Bash:
Feb 20 19:34:47 frozencreek zfs[1200]: cannot mount '/MountainPool': directory is not empty
Feb 20 19:34:47 frozencreek systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 19:34:47 frozencreek systemd[1]: zfs-mount.service: Failed with result 'exit-code'
Feb 20 19:35:11 frozencreek pvestatd[1709]: unable to activate storage 'data' - directory is expected to be a mount point but is not mounted: '/MountainPool/share'
Feb 20 19:35:22 frozencreek pvestatd[1709]: unable to activate storage 'ISO' - directory is expected to be a mount point but is not mounted: '/MountainPool/share/iso'
Feb 20 19:35:22 frozencreek pvestatd[1709]: unable to activate storage 'Backups' - directory is expected to be a mount point but is not mounted: '/MountainPool/share/Backups/'
 
Last edited:
I updated the /lib/systemd/system/zfs-mount.service file to mount these datasets on startup. The zfs-mount service wasn't mounting these non-empty directories.

From
ExecStart=/sbin/zfs mount -a
To
ExecStart=/sbin/zfs mount -O -a
 
I had rebooted and realized they didn't mount automatically when the server came back up. I set the is_mountpoint option to my storageIDs that were setup as Directories.

Does that option get added to the /etc/pve/storage.cfg file?
You run that command in CLI and it will add a new line in the storage.cfg. The enabled "is_mountpoint" option will prevent PVE from creating empty folders which conflicts with mounting the datasets.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!