After one disk failed proxmox can't boot and zpool can't import zfs partition anymore

RMM

Active Member
Oct 25, 2013
30
1
28
We normally shutdown a server. Just before doing that I've checked "zpool status" and everything was alright. Then when starting up the server again, on of the disk drives failed directly on powering the server on. We were using a zfs mirroring configuration. So theoretically the server should also be able to boot with only one disk connected. But it doesn't it can't import the zfs file system. I've tried a lot but it doesn't seem to work. Here are the outputs (The failed disk was disconnected):
Code:
# zpool import

   pool: rpool
     id: 15892818917949697533
  state: FAULTED
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
    The pool may be active on another system, but can be imported using
    the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
 config:

    rpool       FAULTED  corrupted data
      mirror-0  DEGRADED
        sda     UNAVAIL  corrupted data
        sda3    UNAVAIL

Code:
# zpool import rpool
cannot import 'rpool': pool was previously in use from another system.
Last accessed by <unknown> (hostid=0) at Thu Jan  1 00:00:00 1970
The pool can be imported, use 'zpool import -f' to import the pool.

Code:
# zpool import -f rpool                                                                                                                                                                                                                                                                                                     :(
cannot import 'rpool': one or more devices is currently unavailable

Code:
zpool import -F rpool                                                                                                                                                                                                                                                                                                     :(
cannot import 'rpool': pool was previously in use from another system.
Last accessed by <unknown> (hostid=0) at Thu Jan  1 00:00:00 1970
The pool can be imported, use 'zpool import -f' to import the pool.

Code:
zpool import -fF rpool                                                                                                                                                                                                                                                                                                    :(
cannot import 'rpool': one or more devices is currently unavailable

My best guess is that the filesystem is shredded and I need to restore from backup (will take HOOOUUURSSS)? Is there anything I could try to get it back?
If it is shredded that would just be really bad luck? Usually it should be able to recover from one disk failing, right?
 
It is really weird that the mirror consists of sda and sda3. Besides it not using the more common /dev/disk/by-id, it is strange because sda3 is the third partion of sda. This cannot be right. More likely, if the Proxmox installer created the rpool, it would be sda3 and sdb3 (or similar). Maybe detaching the sda part of the mirror would help?
PS: Or physically remove the sda drive (which includes sda3) and see if the system will boot from the other part of the mirror.
 
Last edited:
I don't remember anymore wich Proxmox version the installer was. But the whole partition layout and everything was created by it. I can't remove sda and boot from the other part from the mirror, since the other disk is completely death.
I've tried to detach sda from the pool. But as it seems I can not do that, from an not active pool.
 
HA! got it working again. As it seems the primary GPT table was corrupt. I've just rewrote it, and now I'm able to import the pool. How it got corrupted only the gods know. Thanks for your help. It made me revise everything again.

Code:
# fdisk /dev/sda

Welcome to fdisk (util-linux 2.33).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

The primary GPT table is corrupt, but the backup appears OK, so that will be used.

Command (m for help): w

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!