Hi
I already read many posts about the title's procedure and had a guide written, but when it comes real world production environment, anxiety takes place and many things could go wrong. So I need your approval, before driving 65Kms to work on Saturday to replace a failed (ready to fail) disk.
What I had written down 2 years ago was
(The failed disk is a member of a raid 10 and not a bootable drive, created for VMs, so it is crucial to have that particular pool healthy)
1.Check failed drive's info with
From gui (Node->Disks) found the serial and also that it has 2 partitions (sdc1 and sdc9)
5000cca07....d38 (serial) (Ok so if I unplug them all I ll find the failed one to be replaced)
scsi-35000cca07....d38 /dev/sdc
ls -alh /dev/disk/by-id/
Since there are 2 references for the same disk, do I care about scsi-35000cca07....d38 or wwn-0x5000cca07....d38?
blkid
/dev/sdc1: LABEL="HHproxVM" UUID="1464967671323....067" UUID_SUB="9827066056999849589" BLOCK_SIZE="512" TYPE="zfs_member" PARTLABEL="zfs-97d5289003e3eac8" PARTUUID="e0e83999-e8e8-2744-a08c-dfe90e88f766"
/dev/sdc9: PARTUUID="1403d293-4a58-5f47-ab3a-63a4c6559949"
Is there anything I want from here?
Why sdc doesn thave a UUID and only it's partitions do?
Is UUID different than GUID and how to find GUID if preferred?
zpool status
State : Degraded
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scsi-35000cca07....d38 FAULTED 0 34 (Write) 0 too many errors (The one to be replaced)
2.Set the failed disk offline
zpool offline HHVM scsi-35000cca07....d38
Does it need -f option to force it?
Does the failed hdd needs to be mentioned otherwise, like UUid for example found from one of the above commands?
Do I need afterwards to remove the disk as a well with zpool remove HHVM scsi-35000cca07....d38 ?
3.Physically change the hard drives (online if you have hot swap or shutdown first if you haven't)
Now 4a. as the next step or 4b or no step 4 since the failed drive is not a bootable one??
4a.create an empty GPT Partition Table on the new hdd with parted:
Do I need to create a GPT partition? Also the failed disk as well as all the other participating in raid 10 have 2 partitions each, not one.
Do I need to replicate the partition table to the new disk since it is not a bootable one?
parted /dev/new-disk
(parted)# print
(parted)# mklabel GPT
(parted)# Yes
(parted)# q
or
4b.Copy the partition table from a mirror member to the new one
sgdisk /dev/disk/by-id/mirror_member_drive -R /dev/disk/by-id/new_drive or????
sgdisk /dev/disk/by-id/new_drive -R /dev/disk/by-id/mirror_member_drive
lsblk (check afterwards if both disks have the same amount of partitions. sda/sda1,sda2,sda3 and sdb/sdb1,sdb2,sdb3)
sgdisk -G /dev/disk/by-id/new_drive
Used for generalizing the new drive's id, because it copied the id of the current member with the above command?
5.Replace the old disk with the new one
zpool replace -f pool_name old_drive new_drive so
zpool replace -f HHVM scsi-35000cca07....d38 new_disk (not in the office yet to check the name of the new one)
Do I use replace option with the disk or a partition of a disk, since all members have two partitions?
So do I need to issue two replace commands one for partition 1 and another for partition 9?
6.Monitoring status
zpool status -v
Please confirm or add extra steps needed in order to have it as well documented as possible.
I already read many posts about the title's procedure and had a guide written, but when it comes real world production environment, anxiety takes place and many things could go wrong. So I need your approval, before driving 65Kms to work on Saturday to replace a failed (ready to fail) disk.
What I had written down 2 years ago was
(The failed disk is a member of a raid 10 and not a bootable drive, created for VMs, so it is crucial to have that particular pool healthy)
1.Check failed drive's info with
From gui (Node->Disks) found the serial and also that it has 2 partitions (sdc1 and sdc9)
5000cca07....d38 (serial) (Ok so if I unplug them all I ll find the failed one to be replaced)
scsi-35000cca07....d38 /dev/sdc
ls -alh /dev/disk/by-id/
Code:
scsi-35000cca07....d38 -> ../../sdc
scsi-35000cca07....d38-part1 -> ../../sdc1
scsi-35000cca07....d38-part9 -> ../../sdc9
wwn-0x5000cca07....d38 -> ../../sdc
wwn-0x5000cca07....d38-part1 -> ../../sdc1
wwn-0x5000cca07....d38-part9 -> ../../sdc9
blkid
/dev/sdc1: LABEL="HHproxVM" UUID="1464967671323....067" UUID_SUB="9827066056999849589" BLOCK_SIZE="512" TYPE="zfs_member" PARTLABEL="zfs-97d5289003e3eac8" PARTUUID="e0e83999-e8e8-2744-a08c-dfe90e88f766"
/dev/sdc9: PARTUUID="1403d293-4a58-5f47-ab3a-63a4c6559949"
Is there anything I want from here?
Why sdc doesn thave a UUID and only it's partitions do?
Is UUID different than GUID and how to find GUID if preferred?
zpool status
State : Degraded
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scsi-35000cca07....d38 FAULTED 0 34 (Write) 0 too many errors (The one to be replaced)
2.Set the failed disk offline
zpool offline HHVM scsi-35000cca07....d38
Does it need -f option to force it?
Does the failed hdd needs to be mentioned otherwise, like UUid for example found from one of the above commands?
Do I need afterwards to remove the disk as a well with zpool remove HHVM scsi-35000cca07....d38 ?
3.Physically change the hard drives (online if you have hot swap or shutdown first if you haven't)
Now 4a. as the next step or 4b or no step 4 since the failed drive is not a bootable one??
4a.create an empty GPT Partition Table on the new hdd with parted:
Do I need to create a GPT partition? Also the failed disk as well as all the other participating in raid 10 have 2 partitions each, not one.
Do I need to replicate the partition table to the new disk since it is not a bootable one?
parted /dev/new-disk
(parted)# print
(parted)# mklabel GPT
(parted)# Yes
(parted)# q
or
4b.Copy the partition table from a mirror member to the new one
sgdisk /dev/disk/by-id/mirror_member_drive -R /dev/disk/by-id/new_drive or????
sgdisk /dev/disk/by-id/new_drive -R /dev/disk/by-id/mirror_member_drive
lsblk (check afterwards if both disks have the same amount of partitions. sda/sda1,sda2,sda3 and sdb/sdb1,sdb2,sdb3)
sgdisk -G /dev/disk/by-id/new_drive
Used for generalizing the new drive's id, because it copied the id of the current member with the above command?
5.Replace the old disk with the new one
zpool replace -f pool_name old_drive new_drive so
zpool replace -f HHVM scsi-35000cca07....d38 new_disk (not in the office yet to check the name of the new one)
Do I use replace option with the disk or a partition of a disk, since all members have two partitions?
So do I need to issue two replace commands one for partition 1 and another for partition 9?
6.Monitoring status
zpool status -v
Please confirm or add extra steps needed in order to have it as well documented as possible.
Last edited: