[SOLVED] zfs pool will not mount at reboot (since upgrade to 6)

RobFantini · Aug 11, 2019

Tommmii said:
the mount point for the pool is "/zfs-pool", should that directory actually exist or is "zfs mount -a" supposed to create that directory ?

I do not think that needs to exist prior to the command.

also you ought to try the suggestion from what you posted:

Code:

use the form 'zpool import <pool | id> <newpool>' to give it a new name

so something like this. check man page 1st.

Code:

zpool import zfs-pool | id  zfs-pool-new

RobFantini · Aug 11, 2019

as to your question on /zfs-pool , i do not see from history that the directory was set as a mountpoint.

so AFAIK /zfs-pool is not used. [ note i am not an expert. lnxbil is ].

can you send these:

Code:

zpool list

zfs list

zfs list -t snapshot

Tommmii · Aug 11, 2019

Code:

root@pve:/# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zfs-pool   928G   638G   290G        -         -    33%    68%  1.00x    ONLINE  -
root@pve:/# zfs list
NAME                                                USED  AVAIL     REFER  MOUNTPOINT
zfs-pool                                            463G   190G      186K  /zfs-pool
zfs-pool/iso                                        140K   190G      140K  /zfs-pool/iso
zfs-pool/share                                      121G   190G      121G  /zfs-pool/share
zfs-pool/subvol-100-disk-0                         3.29G  4.89G     3.11G  /zfs-pool/subvol-100-disk-0
zfs-pool/subvol-102-disk-0                         1.19G  6.81G     1.19G  /zfs-pool/subvol-102-disk-0
zfs-pool/subvol-103-disk-0                         1.52G  7.13G      889M  /zfs-pool/subvol-103-disk-0
zfs-pool/subvol-104-disk-0                         1.10G  7.22G      797M  /zfs-pool/subvol-104-disk-0
zfs-pool/subvol-106-disk-0                         1.79G  6.21G     1.79G  /zfs-pool/subvol-106-disk-0
zfs-pool/subvol-300-disk-0                         3.02G  5.54G     2.46G  /zfs-pool/subvol-300-disk-0
zfs-pool/vm-disks                                   330G   190G      140K  /zfs-pool/vm-disks
zfs-pool/vm-disks/vm-101-disk-0                    24.3G   190G     24.3G  -
zfs-pool/vm-disks/vm-105-disk-0                    14.2G   190G     14.2G  -
zfs-pool/vm-disks/vm-200-disk-0                    8.19G   190G     6.05G  -
zfs-pool/vm-disks/vm-200-state-stable               728M   190G      728M  -
zfs-pool/vm-disks/vm-201-disk-0                    25.2G   190G     25.2G  -
zfs-pool/vm-disks/vm-206-disk-0                    74.0G   190G     46.6G  -
zfs-pool/vm-disks/vm-333-disk-0                    57.3G   190G     57.3G  -
zfs-pool/vm-disks/vm-334-disk-1                    76.1G   190G     76.1G  -
zfs-pool/vm-disks/vm-335-disk-0                    41.9G   190G     31.3G  -
zfs-pool/vm-disks/vm-335-state-preupdate           3.58G   190G     3.58G  -
zfs-pool/vm-disks/vm-335-state-suspend-2019-08-10  4.75G   190G     4.75G  -
root@pve:/# zfs list -t snapshot
NAME                                           USED  AVAIL     REFER  MOUNTPOINT
zfs-pool/subvol-100-disk-0@apt                 184M      -     1.11G  -
zfs-pool/subvol-103-disk-0@plex_base           668M      -     1.19G  -
zfs-pool/subvol-104-disk-0@preiptablesatboot   327M      -      923M  -
zfs-pool/subvol-300-disk-0@baseline            110M      -     1.54G  -
zfs-pool/subvol-300-disk-0@baseline2          45.9M      -     1.74G  -
zfs-pool/subvol-300-disk-0@apt                 150M      -     2.43G  -
zfs-pool/vm-disks/vm-200-disk-0@stable        2.15G      -     3.82G  -
zfs-pool/vm-disks/vm-206-disk-0@clean         27.4G      -     46.6G  -
zfs-pool/vm-disks/vm-335-disk-0@preupdate     10.6G      -     30.3G  -
root@pve:/#

RobFantini · Aug 11, 2019

so that looks normal. and something sets the mountpoints so that is ok.

what is the output of

Code:

qm start 201

pct start 100

Tommmii · Aug 11, 2019

after i do a "zfs mount -a" , there is no output to the start commands, and the VM/LXC start normal.

After a reboot & prior to "zfs mount -a", nothing starts & I end up with errors like

Code:

Job for pve-container@100.service failed because the control process exited with error code.
See "systemctl status pve-container@100.service" and "journalctl -xe" for details.
TASK ERROR: command 'systemctl start pve-container@100' failed: exit code 1

here is the output of "journalctl -xe" after a reboot : https://paste.ubuntu.com/p/QGZzdt7PQb/

at around 15:49:08, CT 100 is being started, shortly followed by errors indicating the missing pool.

RobFantini · Aug 11, 2019

send output of

Code:

systemctl status pve-container@100.service

same for a kvm

the 'here is the output of "journalctl -xe" after a reboot : https://paste.ubuntu.com/p/QGZzdt7PQb/'
is too much to view

Tommmii · Aug 11, 2019

Code:

root@pve:~# systemctl status pve-container@100.service
● pve-container@100.service - PVE LXC Container: 100
   Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2019-08-11 16:32:26 CEST; 20s ago
     Docs: man:lxc-start
           man:lxc
           man:pct
  Process: 1322 ExecStart=/usr/bin/lxc-start -n 100 (code=exited, status=1/FAILURE)
Aug 11 16:32:19 pve systemd[1]: Starting PVE LXC Container: 100...
Aug 11 16:32:26 pve lxc-start[1322]: lxc-start: 100: lxccontainer.c: wait_on_daemonized_start: 856 No such file or directory - Failed to receive the container state
Aug 11 16:32:26 pve lxc-start[1322]: lxc-start: 100: tools/lxc_start.c: main: 330 The container failed to start
Aug 11 16:32:26 pve lxc-start[1322]: lxc-start: 100: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
Aug 11 16:32:26 pve lxc-start[1322]: lxc-start: 100: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options
Aug 11 16:32:26 pve systemd[1]: pve-container@100.service: Control process exited, code=exited, status=1/FAILURE
Aug 11 16:32:26 pve systemd[1]: pve-container@100.service: Failed with result 'exit-code'.
Aug 11 16:32:26 pve systemd[1]: Failed to start PVE LXC Container: 100.

as for kvm, strangely enough they seem to start just fine now. I never noticed bc I have no kvm autostarting.. But their storage is also inside the same zfs pool...which is not mounting.

Tommmii · Aug 11, 2019

from the journalctl output (https://paste.ubuntu.com/p/QGZzdt7PQb/)
i did see this :

Code:

-- The job identifier is 49.
Aug 11 15:48:52 pve kernel: EXT4-fs (sde1): mounted filesystem with ordered data mode. Opts: (null)
Aug 11 15:48:53 pve zpool[822]: invalid or corrupt cache file contents: invalid or missing cache file
Aug 11 15:48:53 pve systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- An ExecStart= process belonging to unit zfs-import-cache.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 1.
Aug 11 15:48:53 pve systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit zfs-import-cache.service has entered the 'failed' state with result 'exit-code'.
Aug 11 15:48:53 pve systemd[1]: Failed to start Import ZFS pools by cache file.
-- Subject: A start job for unit zfs-import-cache.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A start job for unit zfs-import-cache.service has finished with a failure.
-- 
-- The job identifier is 119 and the job result is failed.
Aug 11 15:48:53 pve systemd[1]: Reached target ZFS pool import target.
-- Subject: A start job for unit zfs-import.target has finished successfully

but i don't have a clue what it means or what the consequences might be...

RobFantini · Aug 11, 2019

Tommmii said:
Failed to start Import ZFS pools by cache file

search web for that, check results like: https://github.com/zfsonlinux/zfs/issues/3918

so kvm's start so your zfs is ok.

start another thread related to the 'systemctl status pve-container@100.service ' issue

Tommmii · Aug 11, 2019

but both must be related, surely ?
Since doing "zfs mount -a" allows pve-container@100.service to start without errors.

Tommmii · Aug 11, 2019

errr...

given that i know that I put the zfs cache into a partition of the SSD ,

and given :

Code:

root@pve:/etc/default# zpool history
History for 'zfs-pool':
2018-10-10.16:41:27 zpool create -f -o ashift=12 zfs-pool raidz /dev/sda /dev/sdb /dev/sdc /dev/sdd cache /dev/sde5 log /dev/sde4

and given:

Code:

sdf                    8:80   0 111.8G  0 disk
├─sdf1                 8:81   0     1M  0 part
├─sdf2                 8:82   0   256M  0 part
├─sdf3                 8:83   0  71.8G  0 part
│ ├─pve-root         253:0    0  17.8G  0 lvm  /
│ ├─pve-swap         253:1    0     8G  0 lvm  [SWAP]
│ ├─pve-data_tmeta   253:2    0     1G  0 lvm
│ │ └─pve-data-tpool 253:4    0    30G  0 lvm
│ │   └─pve-data     253:5    0    30G  0 lvm
│ └─pve-data_tdata   253:3    0    30G  0 lvm
│   └─pve-data-tpool 253:4    0    30G  0 lvm
│     └─pve-data     253:5    0    30G  0 lvm
├─sdf4                 8:84   0     8G  0 part
└─sdf5                 8:85   0  31.8G  0 part

Would it not seem that the cache partition has moved from sde5 to sdf5 ??

RobFantini · Aug 12, 2019

drive letters are something to stay away from in the future when creating zpools as the drive letters change. linux kernel upgrades seem to have caused drive letter changes or adding hardware...

we use wwn instead of drive letters.

for instance as a drive is inserted use this and put info to a table so the drive location is known if it fails

Code:

ls -l  /dev/disk/by-id/ |grep -v part |grep wwn

then create zpool

Code:

zpool create -f -o ashift=12 tank mirror wwn-0x55cd2e404c025xxx wwn-0x55cd2e404c02xxxx

Tommmii · Aug 12, 2019

I know that now...i didn't when I first created the pool, and the tutorial I was following didn't use long aliases for drives / partitions.

On the bright side, the issue seems solved through help from the ZoL mailing list.
fwiw, and for posterity, here's the thread : https://zfsonlinux.topicbox.com/groups/zfs-discuss/T87d6dd6a8e308695

TL/DR : zfs-import-scan.service was not enabled.
Something funky may have caused this when I upgraded from pve 5 to 6, which is when the issues first appeared.

RobFantini · Aug 12, 2019

yea the things we all learn the hard way!

RobFantini · Aug 12, 2019

read your thread at that link,

notes from a working zfs system:

Code:

# systemctl status zfs-import-scan.service
● zfs-import-scan.service - Import ZFS pools by device scanning
   Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:zpool(8)

yet zpools are imported at boot:

Code:

nfs  ~ # zpool status
  pool: bkup-new
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 07:12:14 with 0 errors on Sun Aug 11 07:36:15 2019
config:

        NAME                        STATE     READ WRITE CKSUM
        bkup-new                    ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x5000cca26aexxxxx  ONLINE       0     0     0
            wwn-0x5000cca26aexxxxx  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 00:00:35 with 0 errors on Sun Jul 14 00:24:38 2019
config:

        NAME                        STATE     READ WRITE CKSUM
        rpool                       ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            sdl2                    ONLINE       0     0     0
            wwn-0x55cd2e414dxxxxx   ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 01:10:48 with 0 errors on Sun Jul 14 01:34:52 2019
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x55cd2e414dexxxxx  ONLINE       0     0     0
            wwn-0x55cd2e414dexxxxx  ONLINE       0     0     0
          mirror-1                  ONLINE       0     0     0
            wwn-0x55cd2e414dexxxxx  ONLINE       0     0     0
            wwn-0x55cd2e414dexxxxx  ONLINE       0     0     0
          mirror-2                  ONLINE       0     0     0
            wwn-0x55cd2e414f1xxxxx  ONLINE       0     0     0
            wwn-0x55cd2e414f1xxxxx  ONLINE       0     0     0

errors: No known data errors

so zfs-import-scan.service disabled is not unusual.

now what caused the problem you ran in to is still not known.

proxmox is a collection of fantastic software that is not at its final version.

zfs is mature software with great documentation. zfsonlinux like almost all software is is still under development . features are added, bugs fixed. that said operator configuration causes most issues.

Tommmii · Aug 12, 2019

RobFantini said:
operator configuration causes most issues

i fully agree that the main problem resides between the chair & the keyboard.

Still, enabling the service made the difference between containers starting at boot or not starting at boot.

RobFantini · Aug 12, 2019

Tommmii said:
Still, enabling the service made the difference between containers starting at boot or not starting at boot.

and that could be a configuration bug which are hard to fix unless it can be reproduced. usually we do some combination of configuration not tested by devs and run into walls or bumps. see my ceph issue thread.

Duckdave · Aug 12, 2019

After i enabled the zfs-import-scan service and reboot the service wont start:

Code:

 zfs-import-scan.service - Import ZFS pools by device scanning
   Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; enabled; vendor preset: disabled)
   Active: inactive (dead)
Condition: start condition failed at Mon 2019-08-12 12:53:00 CEST; 1min 55s ago
           └─ ConditionPathExists=!/etc/zfs/zpool.cache was not met
     Docs: man:zpool(8)

So i renamed the zpool.cache file and the service starts without problems. Also the mount of the zfs pool at boot works nice.

Regards

RobFantini · Aug 12, 2019

Tommmii said:
Would it not seem that the cache partition has moved from sde5 to sdf5 ??

one of our systems has the same issue. i created the drive 6 months ago using

Code:

zpool add bkup -f log ata-INTEL_SSDSC2BA200G3_BTTV422201TM20XXXX-part1 cache ata-INTEL_SSDSC2BA200G3_BTTV422201TM20XXXX-part2

now :

Code:

# zpool status tank
  pool: tank
 state: ONLINE
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 4h17m with 0 errors on Sun Aug 11 04:41:04 2019
config:

        NAME                                                STATE     READ WRITE CKSUM
        tank                                                ONLINE       0     0     0
          raidz1-0                                          ONLINE       0     0     0
            ata-TOSHIBA_HDWE160_277GK2ARXXXX             ONLINE       0     0     0
            ata-TOSHIBA_HDWE160_277GK2ASXXXX               ONLINE       0     0     0
            ata-TOSHIBA_HDWE160_277HK78JXXXX              ONLINE       0     0     0
        logs
          ata-INTEL_SSDSC2BA200G3_BTTV422201TM200GGN-part1  ONLINE       0     0     0
        cache
          sdc2                                              UNAVAIL      0     0     0

sdc is not the ssd i installed cache to. it is a large hdd.

so there is a bug...

RobFantini · Aug 12, 2019

this fixed our issue:

Code:

mv  /etc/zfs/zpool.cache  /etc/zfs/zpool.cache-
systemctl enable zfs-import-scan.service             

init 6

on reboot i had to do this as the pool for vm's did not import

Code:

zpool import tank #  put your pool name in place of tank

thanks to Tommmii thread at https://zfsonlinux.topicbox.com/groups/zfs-discuss/T87d6dd6a8e308695

[SOLVED] zfs pool will not mount at reboot (since upgrade to 6)

Famous Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Famous Member

Famous Member

Well-Known Member

Famous Member

Member

Famous Member

Famous Member

We value your privacy