[SOLVED] ZFS rpool unavailable - message: cannot import pool no such pool or dataset

Joris L.

Well-Known Member
May 16, 2020
287
17
58
51
Antwerp, Belgium
commandline.be
This thread to document what happened and how this was fixed, just in case this can help someone.

Running pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.11.11-2-pve) in a single host set-up for a lab environment.​
The system typically is stable, until it is not. After a few 'hangs' this resulted in a non bootable system.​

As a good measure I've now switched back to booting the default 6.8.x kernel, somehow the system keeps reverting back to the 6.11 kernel which should now be fixed.

The on-screen message cannot import pool no such pool or dataset was followed by a suggestion to restore from backup or try a manual import

Status

Right now, the system is functioning well again and no data was lost.
What made the system bootable again is not 100% determined, neither is the root cause 100% determined.​
I will spare you the few hours of sweating and trial and errors.​

Observations
  • zpool status (obviously) failed because there no pool found
  • manually executing zpool import did not result in an online rpool, a single partition was reported as off-line
  • the pool was reported a UNAVAILABLE
  • GAP while this was not observed initially one of the disks was simply not showing up anymore
Solution
  • booted the system with a simple live system
  • use gdisk to restore the GPT partition table backup
  • poweroff the machine, power off the PSU (mentioned for completeness)
    • leave powered off for 10-20 seconds
  • power on the machine
  • boot is okay now
    • the missing disk is now visible again, the unreachable partition is now available again
Script

To ensure any damage to the GPT partitions can be recovered I wrote a simple script to backup GPT partition tables

consider mkdir gptbackup

Code:
#!/bin/bash

disks=`lsblk -p --nodeps --noheadings | grep -v zd | cut -d" "  -f1`
h=`hostname -s`
p="gptbackup"
dat=`date --iso-8601`

for d in $disks
do
        x=`echo "$d" | cut -d"/" -f3`
        sgdisk -b "$p"/"$h"_"$dat"_"$x" "$d"
done

there should now be a backup in the gptbackup folder for the GPT tables for the physical disks

Concerns and considerations

Unless there is malicious activity on this system (not impossible yet considered unlikely) running the 6.11 kernel does have real world risks, better don't.
This systems is not (yet) running with ECC-RAM and suffered from multiple "freeze" situations, this can 'mangle' partition tables.​
The fact one disk was simple 'not visible' across multiple reboots does remain a cause for concern which I have no answer for.​
In case the 6.8 kernel also shows similar situation I'll update in this thread.​
For now I assume this is resolved with switching the the default kernel again.​
 
Last edited: