How to identify Zraid1 Broken Disk

yena

Renowned Member
Nov 18, 2011
373
4
83
Hello,
i'm testing a simple server with only 2 HD Zraid1.
If one hd fail:

pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
3371417627674286864 FAULTED 0 0 0 was /dev/sda2
sda2 ONLINE 0 0 0

-------------------------------------------------------------------------------

Can i identify the broken HD and change it without shutdown ?
If Yes, how ?
If not .. the only way i have finded is find the serial of the survived HD andso change the other one.

smartctl -i /dev/sda
...
Serial Number: WD-WCC4N4HP56UP -> This is the good one

Thanks!
 
Can i identify the broken HD and change it without shutdown ?

Yes and it depends on your hardware. If you have a caddy for all your disks (like in a server) this is very easy, otherwise you probably have to shut down (you will not operate inside a machine and plug and unplug cables!)

If you only have two disks, just read the disk of the good one and look at the drive indicator (it should flash rapidly) with a simple dd commad:

Code:
dd if=/dev/sda of=/dev/null bs=128K

Press CTRL+C to abort after you identified your disk.

It this does not work and you have not server grade hardware, just shutdown, unplug on disk and try to boot and check serial and act accordingly.
 
You can also check if your HDD expose the "locate/fail" feature, echoing 1 should blink the LED.

In addition: http://serverfault.com/a/799793 (this could also be integrated in PVE)

Newer ZFS versions ships with a ZED script that will automatically blink the failed disk led.
 
I wasn't aware of that. Nice!
(hardware support mandatory)

Yes but many recent server has it.

This is on a DELL R530:
Code:
# ls -la /sys/class/enclosure/0\:0\:32\:0/0/      
total 0
drwxr-xr-x  3 root root    0 Nov 19 12:58 .
drwxr-xr-x 11 root root    0 May 12  2016 ..
-rw-r--r--  1 root root 4096 Nov 19 12:57 active
-rw-r--r--  1 root root 4096 Nov 19 12:57 fault
-rw-r--r--  1 root root 4096 Nov 19 12:57 locate
drwxr-xr-x  2 root root    0 Nov 19 12:57 power
-rw-r--r--  1 root root 4096 Nov 19 12:57 status
-r--r--r--  1 root root 4096 Nov 19 12:57 type
-rw-r--r--  1 root root 4096 Nov 19 12:57 uevent

both locate and fault led are available

I don't have that on a DELL R510, but i've not checked carefully
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!