Zpool Degraded after disk failure

Mar 4, 2019
5
0
21
39
Hello everyone!

I have a zfs pool with 3 disks (1 TB each one) and one disk crashed. This disk was replaced with the same size and branch. After reboot my zfs pool show me DREGRADED message like below. I did the replacement disk and I can see the new disk but I dont have sure if the new disk is working as expect inside the pool.
Can you help me please?

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
ZFS 2.72T 437G 2.29T - 5% 15% 1.00x DEGRADED -

#zpool status -v
pool: ZFS
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
scan: resilvered 1.65M in 0h0m with 0 errors on Tue Feb 12 10:57:25 2019
config:

NAME STATE READ WRITE CKSUM
ZFS DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
18362604275904331643 UNAVAIL 0 0 0 was /dev/sde1
errors: No known data errors

zpool iostat -v ZFS
capacity operations bandwidth
pool alloc free read write read write
------------------------ ----- ----- ----- ----- ----- -----
ZFS 437G 2.29T 14 81 74.2K 2.63M
raidz1 437G 2.29T 14 81 74.2K 2.63M
sdb - - 7 41 37.1K 1.31M
sdc - - 7 39 37.1K 1.31M
18362604275904331643 - - 0 0 0 0
------------------------ ----- ----- ----- ----- ----- -----

# fdisk -l | grep sd
Partition 2 does not start on physical sector boundary.
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
/dev/sda1 2048 7813119 7811072 3.7G 82 Linux swap / Solaris
/dev/sda2 * 7813120 8984575 1171456 572M 83 Linux
/dev/sda3 8984576 1953523711 1944539136 927.2G 83 Linux
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
/dev/sdb1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdb9 1953507328 1953523711 16384 8M Solaris reserved 1
Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
/dev/sdc1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdc9 1953507328 1953523711 16384 8M Solaris reserved 1
Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
/dev/sdd1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdd9 1953507328 1953523711 16384 8M Solaris reserved 1
Partition 2 does not start on physical sector boundary.
Partition 2 does not start on physical sector boundary.
 
How did you replace the drive.
Id possible I always add the disk physically to the machine without removing bad one.
Than I run zfs replace command , google the exact command syntax , and tell it to Repalce the failed disks with new one.
Once Repalce is done and pool is resilvered, it may take some time but at 1tb not more than a day, and I see the pool is online and healthy I remove the bad disk.
If thus is not possible do the same but simply swap dad disk with good and run the Repalce command.
Zfs does not replace disk on its own. You have to tell it to do so providing the new disk id and old one displayed by status command.

Also while you at it. Well after a successful Repalce ans all, I would suggest .Changing your pool to uuid instead of sdX naming.
 
Hi Jim,
I have swaped the disk with new one and I ran the zpool replace command. After some days and reboot I saw the information about the pool.
Now, what's the best way to fix it and how can I change my pool to uuid instead sdX?

Thanks.
 
I have swaped the disk with new one and I ran the zpool replace command. After some days and reboot I saw the information about the pool.

Please post the command and their output.

Now, what's the best way to fix it and how can I change my pool to uuid instead sdX?

UUID does not solve this problem. Please provide the output of

Code:
zpool status -v rpool

(Replace rpool by your pool name if rpool does not apply)
 
hi there!

After the server reboot everything is looking likes good. I don't have sure if this is related with my disk controller or something like that.
Other question, if my O.S disk crash, after O.S reinstallation can I able to see again the ZFS pool or import it?

My setup is:

1 disk to O.S
3 disks to ZFS pool.

Thanks.

Thiago
 
Other question, if my O.S disk crash, after O.S reinstallation can I able to see again the ZFS pool or import it?

You will loose all your VM settings. Use proper backup or don't use ONE OS disk
Just install PVE on the three disks, then will your system survive a one-disk failure
 
I know that. That's why I asked, because the client delivered the server like this. I will change it soon.
I will loose all vm setting, but can I able to restore the data from the zfs pool?
 
I will loose all vm setting, but can I able to restore the data from the zfs pool?

Yes, if the pool is ok, then the data will also be ok. In most cases you will loose your windows activation, because everything changed about the VM except the data, so this data is important. I'd create a simple rsync job that mirrors your /etc/pve folder to the zfs so that you just have to cope the VM configs back.
 
Following this issue, I have a question.

I have a zfs1 mirror setup on my server, with PVE booting from it. (as you suggested).

Had a faulty drive, been replaced with a new one by my provider. Did the replace command and having the zpool healthy again with ONLINE status.
BUT, the rest of partitions that Proxmox made when you install it from the ISO its not there.

I think Proxmox is only using one partition for the zpool in each disk, and the others are for booting. The risk now is losing the original disk, where resides the original partitions (boot partition included?) and then I suppose the server will not boot.

How can I fix that? (besides installing from zero).
 
Last edited:
How can I fix that? (besides installing from zero).

How did you replace your disk?

If you did it "wrong", you can convert the mirror to a single disk ZFS, correct the partition layout on the second disk and readd the disk to the pool to get a mirror again, so no need to reinstall. If you've installed from PVE6 (and not upgraded from PVE5), you should use the official tool. This wiki article may help to understand what's going on.
 
Thanks for your answer LnxBil,

The disk were replaced by my Hosting provider (Hetzner), we have a dedicated server but not physical access to it. You can access a recovery mode but without the disk mounted (and no ZFS support)
I installed Proxmox from the 6.0 iso, from scratch.

As I read the article you posted, I came to this conclussion:

pve-efiboot-tool format /dev/sda2
pve-efiboot-tool init /dev/sda2
pve-efiboot-tool refresh

This will create the 3 partitions that the article talks about? I need to do something more?

Thanks again in advance.
 
You can access a recovery mode but without the disk mounted (and no ZFS support)

I'm familiar with Hetzner. You can however just install the ZFS packages from Debian, which will compile the driver and then you can use it. But be aware that you need at least the same ZFS version as PVE uses, which may include using the ZFS version from buster backports.

This will create the 3 partitions that the article talks about? I need to do something more?

I would create the partitions beforehand, then do the efi stuff and finally the ZFS stuff.
 
Thanks again LnxBill.
Yes you can install ZFS packages (that takes like... forever? :D I mean too much time), and then I mounted the disks, but when you restart, Linux halted with "cant mount because the disks where mouted/altered elsewhere", (thats not the exact warning, but something like that), nothing that you cant avoid with a -f parameter in the command line. But once again you need to ask por a KVM to reach you server. (That menas in some cases a couple of hours of downtime).

Finally I reinstalled Proxmox from scratch, fourtunally the node was empty when the disk failed, but if anything happens again y will try your recommendation.

Thanks again for your time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!