I have Proxmox 6.2 with 6x 6TB disks in a RAID-Z2 configuration.
Since a few days ago disk 3 failed and the rpool was degraded.
I replaced the faulty disk with another identical one, but after a few hours I got the message that this disk was also faulty.
The resilvering process did continue after this.
Since the number of errors was very small, I decided to try a "zpool clear", but this didn't seam to do much.
So, I rebooted, and now the server won't come back up and I'm stuck in GRUB rescue.
I'm quite new at this, so any help would be appreciated.
The error in GRUB is: No such device 398d577fa780fa77 (which is de faulty disk)
The situation before the reboot was like this.
So in short, I'm screwed and have no idea what to do.
I did try the Resbue boot from a Proxmox ISO USB drive, but it could not auto detect the rpool and that got me nowhere.
I'm pretty sure there can be an easy fix, but what is it?
Should I try re-attaching the initial faulty drive? The resilver did complete though...
Because it is RAIDZ2 the data should have a 2nd disk redundancy.
In GRUB rescue I have tried:
insmod zfs
ls (hd1)
ls (hd2)
ls (hd3)
...
But it keeps saying "Filesystem is unknown"
I also ran "set" and got some info you can see in the screenshot (old school photo)
Since a few days ago disk 3 failed and the rpool was degraded.
I replaced the faulty disk with another identical one, but after a few hours I got the message that this disk was also faulty.
The resilvering process did continue after this.
Since the number of errors was very small, I decided to try a "zpool clear", but this didn't seam to do much.
So, I rebooted, and now the server won't come back up and I'm stuck in GRUB rescue.
I'm quite new at this, so any help would be appreciated.
The error in GRUB is: No such device 398d577fa780fa77 (which is de faulty disk)
The situation before the reboot was like this.
root@o1:~# zpool status
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Sep 10 15:55:40 2020
11.9T scanned at 202M/s, 10.9T issued at 185M/s, 12.0T total
1.43T resilvered, 91.30% done, 0 days 01:38:25 to go
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
sda3 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
replacing-2 UNAVAIL 0 0 0 insufficient replicas
6418447306550617325 UNAVAIL 0 0 0 was /dev/sdc3
sdc FAULTED 0 12 0 too many errors (resilvering)
sdd3 ONLINE 0 0 0
sde3 ONLINE 0 0 0
sdf3 ONLINE 0 0 0
logs
nvme-INTEL_SSDPE21D480GA_PHM28134004Q480BGN ONLINE 0 0 0
cache
nvme0n1 ONLINE 0 0 0
errors: No known data errors
So in short, I'm screwed and have no idea what to do.
I did try the Resbue boot from a Proxmox ISO USB drive, but it could not auto detect the rpool and that got me nowhere.
I'm pretty sure there can be an easy fix, but what is it?
Should I try re-attaching the initial faulty drive? The resilver did complete though...
Because it is RAIDZ2 the data should have a 2nd disk redundancy.
In GRUB rescue I have tried:
insmod zfs
ls (hd1)
ls (hd2)
ls (hd3)
...
But it keeps saying "Filesystem is unknown"
I also ran "set" and got some info you can see in the screenshot (old school photo)
Attachments
Last edited: