[SOLVED] rpool import issue - recovery after power failure

ilia987

Active Member
Sep 9, 2019
273
13
38
36
This morning we had some power outage issue on our server room,
all nodes recovered except one.

This node is part of (one of three) ceph cluster , (pools are set to replication 3) so the data is safe and the cluster is stable, (excluding ceph warning)

Any idea how i can fix it?
 

Attachments

  • Screenshot from 2020-12-01 12-20-54.png
    Screenshot from 2020-12-01 12-20-54.png
    11 KB · Views: 33
Last edited:
* Does the node have ZFS on root? (did it get installed via PVE ISO)
* what's the output of `zpool import` in the initramfs shell
* what's the output of `lsblk`, `blkid` in the shell (not sure if both any of the are available) - else `ls -la /dev/` should also help
* check `dmesg` for potential disk-problems

I hope this helps!
 
  • Like
Reactions: ilia987
*yes from iso

zpool import:
Screenshot from 2020-12-01 17-25-07.png
lbsk:
Screenshot from 2020-12-01 17-26-30.png
ls -la /dev:
starts with
Screenshot from 2020-12-01 17-34-20.png
blkid:
Screenshot from 2020-12-01 17-26-53.png

the end of dmseg:
Screenshot from 2020-12-01 17-33-13.png
 
Last edited:
* does `zpool import -R /rpool -N rpool` on the initramfs shell work?
* if not does `zpool import -R /rpool -N 344988999858258096` work (the numeric identifier of the pool from the screenshot)?

if yes my guess would be that scanning the disks (and all potential iSCSI devices based on the first screenshot you posted) might simply take to long and the drives are not present yet when init tries to import the pool
in that case try adding a rootdelay parameter to the kernel commandline - see:
https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks#Grub_boot_ZFS_problem

I hope this helps!
 
  • Like
Reactions: ilia987
* does `zpool import -R /rpool -N rpool` on the initramfs shell work?
* if not does `zpool import -R /rpool -N 344988999858258096` work (the numeric identifier of the pool from the screenshot)?

if yes my guess would be that scanning the disks (and all potential iSCSI devices based on the first screenshot you posted) might simply take to long and the drives are not present yet when init tries to import the pool
in that case try adding a rootdelay parameter to the kernel commandline - see:
https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks#Grub_boot_ZFS_problem

I hope this helps!
Screenshot from 2020-12-01 17-53-57.png

Screenshot from 2020-12-01 17-55-16.png

how i can scann the boot drives?
 
ok - please try to add the root delay as described in the wiki-page.

* the import was successful - the failing mount is proably due to the non-existing /root/rpool directory :)
see https://github.com/openzfs/zfs/issues/6045
* the second import was unsuccessful, since the first one already imported the pool

I hope this helps!
 
ok - please try to add the root delay as described in the wiki-page.

* the import was successful - the failing mount is proably due to the non-existing /root/rpool directory :)
see https://github.com/openzfs/zfs/issues/6045
* the second import was unsuccessful, since the first one already imported the pool

I hope this helps!
grub files does not exist,
i updated zfs pre sleep, but how i "commit" force config update for the change?
because update-initramfs does not exist
 
yes in the initramfs neither the grub/systemd-boot config, nor update-initramfs exist:
reboot the server and hit 'e' when you're at the boot-loader - there you can set the parameter for one boot - if this fixes the issue you can set it permanently in the config once the system is booted:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot
 
for some reason i thought it worked. and tried to do another reboot,
now after running zpool import i receive no pools available

Screenshot from 2020-12-01 19-03-39.png
 
How i can get to a menu to update zfs sleep args?
"ZFS_INITRD_PRE_MOUNTROOT_SLEEP"
 
just add the 'rootdelay=10' to the kernel commandline - that should hopefully get the system to boot

regarding the no pools available to import - that (usually) says that the pool already has been imported
 
just add the 'rootdelay=10' to the kernel commandline - that should hopefully get the system to boot

regarding the no pools available to import - that (usually) says that the pool already has been imported
Screenshot from 2020-12-02 14-28-42.png

give the error:
cannot import 'rpool': no such pool available
 
anyone got another idea? i am not willing to reinstall the server.

i tired to recover the boot via installation cd but it failed
 
you could try to remove the duplicate root=ZFS=rpool/ROOT/pve-1 in the kernel command-line (not sure if that causes the problem though)
(additionally you could remove the quite parameter - maybe the kernel would print some helpful message pointing to the problem)

if the system boots with rootdelay=15 and it says 'cannot import 'rpool': no such pool available' -> what's the output of `zpool import`?
 
i found a workaround:

after i receive the error of rpool not found. i should wait around 1-2 minutes at least and then run
zpool import -N rpool -f
after it worked i changed the zfs sleep time and now it more stable
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!