ZFS resilvering gone bad

Vassilis Kipouros

Active Member
Nov 2, 2016
54
6
28
48
Hello forum,

I am facing a weird situation on one of my proxmox hosts.
The machine has 10 x 1TB disks on a zfs raidz2 array
which I manualy configured on a previous proxmox version last year
(before the zfs integration in the gui appeared).

Last week one of the disks started acting up (bad sectors) and being kicked out of the array automatically.
So I offlined the disk, shut down the host (the drive was in one of the standard internal bays on an old 16core macpro), removed the failed drive and installl another one.

I did sgdisk -zap and initialize the drive for gpt in the gui, then a zpool replace.
During resilvering the newly replaced drive failed again so I replaced it once more.

The new drive is not resilvering and the state of the array is like this :

upload_2018-11-6_2-51-14.png

upload_2018-11-6_2-54-55.png

root@pve1:~# zdb
tank:
version: 5000
name: 'tank'
state: 0
txg: 1292448
pool_guid: 11140437068772162451
errata: 0
hostid: 656640
hostname: 'pve1'
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 11140437068772162451
children[0]:
type: 'raidz'
id: 0
guid: 14392447650706086278
nparity: 2
metaslab_array: 256
metaslab_shift: 36
ashift: 12
asize: 10001889361920
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 129
children[0]:
type: 'disk'
id: 0
guid: 4653336192466048985
path: '/dev/disk/by-id/ata-WDC_WD10EZEX-08M2NA0_WD-WCC3F2HPJY60-part1'
devid: 'ata-WDC_WD10EZEX-08M2NA0_WD-WCC3F2HPJY60-part1'
phys_path: 'pci-0000:00:1f.2-ata-2'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 130
children[1]:
type: 'disk'
id: 1
guid: 8536830288125222324
path: '/dev/disk/by-id/ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2231388-part1'
devid: 'ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2231388-part1'
phys_path: 'pci-0000:00:1f.2-ata-3'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 131
children[2]:
type: 'disk'
id: 2
guid: 13764874921234554197
path: '/dev/disk/by-id/ata-Hitachi_HDE721010SLA330_STN608MS223SJK-part1'
devid: 'ata-Hitachi_HDE721010SLA330_STN608MS223SJK-part1'
phys_path: 'pci-0000:00:1f.2-ata-4'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 132
children[3]:
type: 'disk'
id: 3
guid: 14722789825191027013
path: '/dev/disk/by-id/ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2233102-part1'
devid: 'ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2233102-part1'
phys_path: 'pci-0000:00:1f.2-ata-5'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 133
children[4]:
type: 'disk'
id: 4
guid: 11833690739911254551
path: '/dev/disk/by-id/ata-ST31000333AS_9TE0G3BA-part1'
devid: 'ata-ST31000333AS_9TE0G3BA-part1'
phys_path: 'pci-0000:00:1f.2-ata-6'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 134
children[5]:
type: 'replacing'
id: 5
guid: 17951560309314577806
whole_disk: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 7901178832198104228
path: '/dev/sdg1/old'
phys_path: 'pci-0000:04:00.0-ata-1'
whole_disk: 1
DTL: 17
create_txg: 4
com.delphix:vdev_zap_leaf: 135
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 13617798510350479601
path: '/dev/sdg1'
devid: 'ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E7SKVF34-part1'
phys_path: 'pci-0000:00:1d.7-usb-0:3:1.0-scsi-0:0:0:0'
whole_disk: 1
not_present: 1
DTL: 167
create_txg: 4
com.delphix:vdev_zap_leaf: 18
offline: 1
resilver_txg: 1269457
children[6]:
type: 'disk'
id: 6
guid: 8732328832337942140
path: '/dev/disk/by-id/scsi-200193c0000000000-part1'
devid: 'scsi-200193c0000000000-part1'
phys_path: 'pci-0000:06:00.0-scsi-0:0:0:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 136
children[7]:
type: 'disk'
id: 7
guid: 5652517988193402051
path: '/dev/disk/by-id/scsi-200193c0100000000-part1'
devid: 'scsi-200193c0100000000-part1'
phys_path: 'pci-0000:06:00.0-scsi-0:0:1:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 137
children[8]:
type: 'disk'
id: 8
guid: 1917645073663426809
path: '/dev/disk/by-id/scsi-200193c0200000000-part1'
devid: 'scsi-200193c0200000000-part1'
phys_path: 'pci-0000:06:00.0-scsi-0:0:2:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 138
children[9]:
type: 'disk'
id: 9
guid: 12626542771498886257
path: '/dev/disk/by-id/scsi-200193c0300000000-part1'
devid: 'scsi-200193c0300000000-part1'
phys_path: 'pci-0000:06:00.0-scsi-0:0:3:0'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 139
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data




I tried removing the disks by their guid:

upload_2018-11-6_3-3-11.png


Any ideas on how to fix this?

Thank you all in advance and sorry for the long post.
 

Attachments

  • upload_2018-11-6_2-49-16.png
    upload_2018-11-6_2-49-16.png
    105.5 KB · Views: 5
Hi,
I did sgdisk -zap and initialize the drive for gpt in the gui, then a zpool replace.
During resilvering the newly replaced drive failed again so I replaced it once more.
When I understand this correctly you used the same disk again?
If so this does not work this way.
ZFS has 2 ID blocks on the start and end of the disk.
So you have to overwrite the first 512 MB and the last 512MB then you can use the disk again.
 
No, the first replacement drive failed during resilvering and was replaced with yet another drive.

The new drive is healthy, the thing is how to get rid of the three "drives" with the red marks (on the gui pic) and then readd the new drive to the array so it starts resilvering from the begining...
 
Last edited:
Looking at the output of your zpool status, I see the "old" (original?) drive and another drive that was presumably the first replacement that failed. I would be inclined to leave those for now, add the new drive and do:

Code:
zpool replace POOLNAME ORIGINAL_DRIVE SECOND_NEW_DRIVE

If that does not start a resilver, reboot. If that still does not trigger a resilver, do a
Code:
zpool scrub POOLNAME

Then try to replace the drive (if it hasn't already been resilvered).

If that works but you still have offline drives listed, do the following but only after the new drive has been successfully resilvered:
Code:
zpool detach POOLNAME OFFLINE_DISK
 
  • Like
Reactions: Vassilis Kipouros
Is there some kind of manual partitioning/preparation that I need to do on the NEW drive?
I notice that all the old drives have weird partitions on them. Is that done auto?
And I guess I will need to use the /dev/disk/by-id naming rather than /dev/sdXX ?
 
root@pve1:~# fdisk -l
Disk /dev/sda: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: D4D9BAA1-A8EB-4D39-8B9B-CDBBF191F6E1

Device Start End Sectors Size Type
/dev/sda1 2048 4095 2048 1M BIOS boot
/dev/sda2 4096 528383 524288 256M EFI System
/dev/sda3 528384 488397134 487868751 232.6G Linux LVM


Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: E34C8FAC-30B2-7A49-8937-CF0651E8051B

Device Start End Sectors Size Type
/dev/sdb1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdb9 1953507328 1953523711 16384 8M Solaris reserved 1


Disk /dev/sdc: 931.5 GiB, 1000203804160 bytes, 1953523055 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E9B09966-4A32-6844-82CB-929AD1BD4BF9

Device Start End Sectors Size Type
/dev/sdc1 2048 1953505279 1953503232 931.5G Solaris /usr & Apple ZFS
/dev/sdc9 1953505280 1953521663 16384 8M Solaris reserved 1


Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 80808AEB-3C5B-0E4C-88D3-AA0573ADC2A7

Device Start End Sectors Size Type
/dev/sdd1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdd9 1953507328 1953523711 16384 8M Solaris reserved 1


Disk /dev/sde: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 579313EA-BAE1-0940-BCF7-2B102BFAADB9

Device Start End Sectors Size Type
/dev/sde1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sde9 1953507328 1953523711 16384 8M Solaris reserved 1


Disk /dev/sdf: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: C11C1110-100A-E14D-90D5-D8847A70E788

Device Start End Sectors Size Type
/dev/sdf1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdf9 1953507328 1953523711 16384 8M Solaris reserved 1


Disk /dev/sdg: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 27F15943-6C62-4D5E-A1AD-95760F9F2A74


Disk /dev/sdj: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E2A66473-FE1D-4648-A9BA-FADE9B1484D7

Device Start End Sectors Size Type
/dev/sdj1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdj9 1953507328 1953523711 16384 8M Solaris reserved 1


Disk /dev/sdk: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: CB58F8FB-663B-1D41-872D-F04519A5F30C

Device Start End Sectors Size Type
/dev/sdk1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdk9 1953507328 1953523711 16384 8M Solaris reserved 1


Disk /dev/sdl: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EBBC7D49-03C5-0145-AAC9-E5D476B57B7C

Device Start End Sectors Size Type
/dev/sdl1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdl9 1953507328 1953523711 16384 8M Solaris reserved 1


Disk /dev/sdm: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B35475B9-B84F-4741-BA56-B1267A41305F

Device Start End Sectors Size Type
/dev/sdm1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdm9 1953507328 1953523711 16384 8M Solaris reserved 1
 
Is there some kind of manual partitioning/preparation that I need to do on the NEW drive?
I notice that all the old drives have weird partitions on them. Is that done auto?
And I guess I will need to use the /dev/disk/by-id naming rather than /dev/sdXX ?
If the new drive is at least as big as the one you're replacing, you can add it without any partitioning; ZFS will take care of that for you during the replace operation.

You can use either format to reference the drive but /dev/disk/by-id is the recommended approach as it won't vary if you change drive controllers, swap into a new computer etc.
 
  • Like
Reactions: Vassilis Kipouros
Hello,

I only can guess why your resilver was not ok. You must double chech how it was your fail disk - it has 512 B OR 4K. From your post I can see that you have a mixed storage with both cases (512/4K)
If your failed disk was 512 and your new disks is 4k, as disk space is the same but as total number of sectors/disk is a BIG difference. zfs is working with sectors as you know. You start replacement and zfs will try to write N sectors (512) to the ne disk, and after some time zfs will see that your disk partition have only Y sectors (4k) but N < Y. And for this reason replacement process is failing.
 
Hello,

I only can guess why your resilver was not ok. You must double chech how it was your fail disk - it has 512 B OR 4K. From your post I can see that you have a mixed storage with both cases (512/4K)
If your failed disk was 512 and your new disks is 4k, as disk space is the same but as total number of sectors/disk is a BIG difference. zfs is working with sectors as you know. You start replacement and zfs will try to write N sectors (512) to the ne disk, and after some time zfs will see that your disk partition have only Y sectors (4k) but N < Y. And for this reason replacement process is failing.


Also you can compare from your post that 2 different disks with 512 and 4k have a different numbers of sectors ;)
 
The first disk replacement started giving out SMART errors during resilvering.
And I just shut down the host, removed the disk and put in the new one...

Which disks are you refering to?
And how could this happen when all the disks where empty (no partitions) when the array was created?

I guess that after I fix my current issue, I can remove the problematic disks, zero them out and reinsert them?
 
Then try to duble check your sATA cables. Also You can try to switch the bad disk sATA port with other port where you allready have a good disk!
 
New issue...

After another hard disk failure, I shutdown, replace faulty disk with new and after boot:

root@pve:~# sgdisk -Z /dev/sdg
GPT data structures destroyed! You may now partition the disk using fdisk or other utilities.
root@pve:~# zpool replace -f tank 4952218975371802621 /dev/disk/by-id/ata-Hitachi_HDS721010CLA332_JP2940HZ3NEWRC
cannot label 'sdg': try using parted(8) and then provide a specific slice: -3


Any ideas?
 
Hi,

My own replacement hdd cook-book is like this:

- take the new hdd and do some initial test (badblocks and smartd) for at least 24 h
- after the test was ok then I go to replace the dead hdd with the new one

In this mode I have big chanses to not replace a bad old hdd with a new bad disk ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!