zfs zpool error during check - can not boot anymore

tebse

Member
Jun 22, 2018
43
0
11
54
Hi Proxmox Support Forum,

I am an inexperienced user with limited Linux knowledge. I am running PVE for approx 2 1/2 years, I did an upgrade to 6.2 (?) some month ago, so this should be my current version.

This night I received an error email:

Code:
The number of I/ errors associated with a ZFS device exceeded acceptable levels. ZFS has marked the device as faulted.
impact: Fault tolerance of the pool may be compromised.
...
vpath /dev/sdd1
...

sdd is an SDD "only" used for log purposes.

Stupid me - as usual when I expect an upgrade / repair / reboot - I started an apt upgrade of the node from the web fronted. This made the web frontend freeze, but ssh was still available. From there I send of

Code:
zpool status

and found the sdd1 as broken. I tried

Code:
zpool remove rpool sdd1

which made the complete system unreactive / freeze. In a reboot, I got this here:

Foto 01.11.20, 09 37 10.jpg

I followed post 2 from @fabian from this thread:
https://forum.proxmox.com/threads/unable-to-boot-from-zfs-rpool-after-upgrading-to-pve-4-2.27222/

Code:
zpool import
zpool import -m rpool

but all I got is this here:


Foto 01.11.20, 09 49 28.jpgFoto 01.11.20, 09 49 21.jpg

I kindly ask for help.
Thank you
Thorsten
 

Attachments

  • Foto 01.11.20, 09 37 10.jpg
    Foto 01.11.20, 09 37 10.jpg
    375.8 KB · Views: 1
  • Foto 01.11.20, 09 49 21.jpg
    Foto 01.11.20, 09 49 21.jpg
    450.1 KB · Views: 2
Last edited:
OK some success:

Code:
zpool import -f

gave an error but after a second reboot I was able to do:

Code:
zpool import -m rpool
exit

Back on SSH I could remove the defect ssd by:
Code:
 zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 0 days 07:08:40 with 0 errors on Sun Oct 11 07:32:43 2020
config:

        NAME                   STATE     READ WRITE CKSUM
        rpool                  DEGRADED     0     0     0
          raidz1-0             ONLINE       0     0     0
            sda2               ONLINE       0     0     0
            sdb2               ONLINE       0     0     0
            sdc2               ONLINE       0     0     0
        logs
          1023037337071760466  UNAVAIL      0     0     0  was /dev/sdd1

errors: No known data errors
root@ebb-vn01:~# zpool remove rpool 1023037337071760466
root@ebb-vn01:~# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 07:08:40 with 0 errors on Sun Oct 11 07:32:43 2020
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
            sdc2    ONLINE       0     0     0

errors: No known data errors
 
I guess, now I can replace the SSD and reassign LOG / CACHE :-)

Next question - I used the SSD /dev/ssd2 to have Swap on it, here is my fstab:

Code:
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/sdd2 none swap sw 0 0
# /dev/zvol/rpool/swap none swap sw 0 0
proc /proc proc defaults 0 0

Now I would like to return to

Code:
# <file system> <mount point> <type> <options> <dump> <pass>
# /dev/sdd2 none swap sw 0 0
/dev/zvol/rpool/swap none swap sw 0 0
proc /proc proc defaults 0 0

Please notice the position of the "#" until I replaced the SSD. As
Code:
root@ebb-vn01:~# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 07:08:40 with 0 errors on Sun Oct 11 07:32:43 2020
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
            sdc2    ONLINE       0     0     0

errors: No known data errors

Did not show an SWAP - how can I temporarily create and set a SWAP?

Thank you and best regards
Thorsten
 
I guess, now I can replace the SSD and reassign LOG / CACHE :)

Next question - I used the SSD /dev/ssd2 to have Swap on it, here is my fstab:

Code:
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/sdd2 none swap sw 0 0
# /dev/zvol/rpool/swap none swap sw 0 0
proc /proc proc defaults 0 0

Now I would like to return to

Code:
# <file system> <mount point> <type> <options> <dump> <pass>
# /dev/sdd2 none swap sw 0 0
/dev/zvol/rpool/swap none swap sw 0 0
proc /proc proc defaults 0 0

Please notice the position of the "#" until I replaced the SSD. As
Code:
root@ebb-vn01:~# zpool status
  pool: rpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 07:08:40 with 0 errors on Sun Oct 11 07:32:43 2020
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
            sdc2    ONLINE       0     0     0

errors: No known data errors

Did not show an SWAP - how can I temporarily create and set a SWAP?

Thank you and best regards
Thorsten

replace 8G with desired swap size

Code:
#zfs destroy rpool/swap

zfs create -V 8G -b $(getconf PAGESIZE) -o compression=zle \
  -o logbias=throughput -o sync=always \
  -o primarycache=metadata -o secondarycache=none \
  -o com.sun:auto-snapshot=false rpool/swap

mkswap -f /dev/zvol/rpool/swap

cat << 'EOF' >> /etc/fstab
/dev/zvol/rpool/swap none swap defaults 0 0
EOF

swapon -av