ZFS resilvering gone bad

Discussion in 'Proxmox VE: Installation and configuration' started by Vassilis Kipouros, Nov 6, 2018.

  1. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    Hello forum,

    I am facing a weird situation on one of my proxmox hosts.
    The machine has 10 x 1TB disks on a zfs raidz2 array
    which I manualy configured on a previous proxmox version last year
    (before the zfs integration in the gui appeared).

    Last week one of the disks started acting up (bad sectors) and being kicked out of the array automatically.
    So I offlined the disk, shut down the host (the drive was in one of the standard internal bays on an old 16core macpro), removed the failed drive and installl another one.

    I did sgdisk -zap and initialize the drive for gpt in the gui, then a zpool replace.
    During resilvering the newly replaced drive failed again so I replaced it once more.

    The new drive is not resilvering and the state of the array is like this :

    upload_2018-11-6_2-51-14.png

    upload_2018-11-6_2-54-55.png

    root@pve1:~# zdb
    tank:
    version: 5000
    name: 'tank'
    state: 0
    txg: 1292448
    pool_guid: 11140437068772162451
    errata: 0
    hostid: 656640
    hostname: 'pve1'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
    type: 'root'
    id: 0
    guid: 11140437068772162451
    children[0]:
    type: 'raidz'
    id: 0
    guid: 14392447650706086278
    nparity: 2
    metaslab_array: 256
    metaslab_shift: 36
    ashift: 12
    asize: 10001889361920
    is_log: 0
    create_txg: 4
    com.delphix:vdev_zap_top: 129
    children[0]:
    type: 'disk'
    id: 0
    guid: 4653336192466048985
    path: '/dev/disk/by-id/ata-WDC_WD10EZEX-08M2NA0_WD-WCC3F2HPJY60-part1'
    devid: 'ata-WDC_WD10EZEX-08M2NA0_WD-WCC3F2HPJY60-part1'
    phys_path: 'pci-0000:00:1f.2-ata-2'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 130
    children[1]:
    type: 'disk'
    id: 1
    guid: 8536830288125222324
    path: '/dev/disk/by-id/ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2231388-part1'
    devid: 'ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2231388-part1'
    phys_path: 'pci-0000:00:1f.2-ata-3'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 131
    children[2]:
    type: 'disk'
    id: 2
    guid: 13764874921234554197
    path: '/dev/disk/by-id/ata-Hitachi_HDE721010SLA330_STN608MS223SJK-part1'
    devid: 'ata-Hitachi_HDE721010SLA330_STN608MS223SJK-part1'
    phys_path: 'pci-0000:00:1f.2-ata-4'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 132
    children[3]:
    type: 'disk'
    id: 3
    guid: 14722789825191027013
    path: '/dev/disk/by-id/ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2233102-part1'
    devid: 'ata-WDC_WD10EACS-32ZJB0_WD-WCASJ2233102-part1'
    phys_path: 'pci-0000:00:1f.2-ata-5'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 133
    children[4]:
    type: 'disk'
    id: 4
    guid: 11833690739911254551
    path: '/dev/disk/by-id/ata-ST31000333AS_9TE0G3BA-part1'
    devid: 'ata-ST31000333AS_9TE0G3BA-part1'
    phys_path: 'pci-0000:00:1f.2-ata-6'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 134
    children[5]:
    type: 'replacing'
    id: 5
    guid: 17951560309314577806
    whole_disk: 0
    create_txg: 4
    children[0]:
    type: 'disk'
    id: 0
    guid: 7901178832198104228
    path: '/dev/sdg1/old'
    phys_path: 'pci-0000:04:00.0-ata-1'
    whole_disk: 1
    DTL: 17
    create_txg: 4
    com.delphix:vdev_zap_leaf: 135
    offline: 1
    children[1]:
    type: 'disk'
    id: 1
    guid: 13617798510350479601
    path: '/dev/sdg1'
    devid: 'ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E7SKVF34-part1'
    phys_path: 'pci-0000:00:1d.7-usb-0:3:1.0-scsi-0:0:0:0'
    whole_disk: 1
    not_present: 1
    DTL: 167
    create_txg: 4
    com.delphix:vdev_zap_leaf: 18
    offline: 1
    resilver_txg: 1269457
    children[6]:
    type: 'disk'
    id: 6
    guid: 8732328832337942140
    path: '/dev/disk/by-id/scsi-200193c0000000000-part1'
    devid: 'scsi-200193c0000000000-part1'
    phys_path: 'pci-0000:06:00.0-scsi-0:0:0:0'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 136
    children[7]:
    type: 'disk'
    id: 7
    guid: 5652517988193402051
    path: '/dev/disk/by-id/scsi-200193c0100000000-part1'
    devid: 'scsi-200193c0100000000-part1'
    phys_path: 'pci-0000:06:00.0-scsi-0:0:1:0'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 137
    children[8]:
    type: 'disk'
    id: 8
    guid: 1917645073663426809
    path: '/dev/disk/by-id/scsi-200193c0200000000-part1'
    devid: 'scsi-200193c0200000000-part1'
    phys_path: 'pci-0000:06:00.0-scsi-0:0:2:0'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 138
    children[9]:
    type: 'disk'
    id: 9
    guid: 12626542771498886257
    path: '/dev/disk/by-id/scsi-200193c0300000000-part1'
    devid: 'scsi-200193c0300000000-part1'
    phys_path: 'pci-0000:06:00.0-scsi-0:0:3:0'
    whole_disk: 1
    create_txg: 4
    com.delphix:vdev_zap_leaf: 139
    features_for_read:
    com.delphix:hole_birth
    com.delphix:embedded_data




    I tried removing the disks by their guid:

    upload_2018-11-6_3-3-11.png


    Any ideas on how to fix this?

    Thank you all in advance and sorry for the long post.
     

    Attached Files:

  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,763
    Likes Received:
    315
    Hi,
    When I understand this correctly you used the same disk again?
    If so this does not work this way.
    ZFS has 2 ID blocks on the start and end of the disk.
    So you have to overwrite the first 512 MB and the last 512MB then you can use the disk again.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    No, the first replacement drive failed during resilvering and was replaced with yet another drive.

    The new drive is healthy, the thing is how to get rid of the three "drives" with the red marks (on the gui pic) and then readd the new drive to the array so it starts resilvering from the begining...
     
    #3 Vassilis Kipouros, Nov 6, 2018
    Last edited: Nov 6, 2018
  4. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    74
    Likes Received:
    34
    Looking at the output of your zpool status, I see the "old" (original?) drive and another drive that was presumably the first replacement that failed. I would be inclined to leave those for now, add the new drive and do:

    Code:
    zpool replace POOLNAME ORIGINAL_DRIVE SECOND_NEW_DRIVE
    If that does not start a resilver, reboot. If that still does not trigger a resilver, do a
    Code:
    zpool scrub POOLNAME
    Then try to replace the drive (if it hasn't already been resilvered).

    If that works but you still have offline drives listed, do the following but only after the new drive has been successfully resilvered:
    Code:
    zpool detach POOLNAME OFFLINE_DISK
     
    Vassilis Kipouros likes this.
  5. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    Is there some kind of manual partitioning/preparation that I need to do on the NEW drive?
    I notice that all the old drives have weird partitions on them. Is that done auto?
    And I guess I will need to use the /dev/disk/by-id naming rather than /dev/sdXX ?
     
  6. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    root@pve1:~# fdisk -l
    Disk /dev/sda: 232.9 GiB, 250059350016 bytes, 488397168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: D4D9BAA1-A8EB-4D39-8B9B-CDBBF191F6E1

    Device Start End Sectors Size Type
    /dev/sda1 2048 4095 2048 1M BIOS boot
    /dev/sda2 4096 528383 524288 256M EFI System
    /dev/sda3 528384 488397134 487868751 232.6G Linux LVM


    Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disklabel type: gpt
    Disk identifier: E34C8FAC-30B2-7A49-8937-CF0651E8051B

    Device Start End Sectors Size Type
    /dev/sdb1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sdb9 1953507328 1953523711 16384 8M Solaris reserved 1


    Disk /dev/sdc: 931.5 GiB, 1000203804160 bytes, 1953523055 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: E9B09966-4A32-6844-82CB-929AD1BD4BF9

    Device Start End Sectors Size Type
    /dev/sdc1 2048 1953505279 1953503232 931.5G Solaris /usr & Apple ZFS
    /dev/sdc9 1953505280 1953521663 16384 8M Solaris reserved 1


    Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 80808AEB-3C5B-0E4C-88D3-AA0573ADC2A7

    Device Start End Sectors Size Type
    /dev/sdd1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sdd9 1953507328 1953523711 16384 8M Solaris reserved 1


    Disk /dev/sde: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 579313EA-BAE1-0940-BCF7-2B102BFAADB9

    Device Start End Sectors Size Type
    /dev/sde1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sde9 1953507328 1953523711 16384 8M Solaris reserved 1


    Disk /dev/sdf: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: C11C1110-100A-E14D-90D5-D8847A70E788

    Device Start End Sectors Size Type
    /dev/sdf1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sdf9 1953507328 1953523711 16384 8M Solaris reserved 1


    Disk /dev/sdg: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 27F15943-6C62-4D5E-A1AD-95760F9F2A74


    Disk /dev/sdj: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: E2A66473-FE1D-4648-A9BA-FADE9B1484D7

    Device Start End Sectors Size Type
    /dev/sdj1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sdj9 1953507328 1953523711 16384 8M Solaris reserved 1


    Disk /dev/sdk: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: CB58F8FB-663B-1D41-872D-F04519A5F30C

    Device Start End Sectors Size Type
    /dev/sdk1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sdk9 1953507328 1953523711 16384 8M Solaris reserved 1


    Disk /dev/sdl: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: EBBC7D49-03C5-0145-AAC9-E5D476B57B7C

    Device Start End Sectors Size Type
    /dev/sdl1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sdl9 1953507328 1953523711 16384 8M Solaris reserved 1


    Disk /dev/sdm: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: B35475B9-B84F-4741-BA56-B1267A41305F

    Device Start End Sectors Size Type
    /dev/sdm1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
    /dev/sdm9 1953507328 1953523711 16384 8M Solaris reserved 1
     
  7. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    74
    Likes Received:
    34
    If the new drive is at least as big as the one you're replacing, you can add it without any partitioning; ZFS will take care of that for you during the replace operation.

    You can use either format to reference the drive but /dev/disk/by-id is the recommended approach as it won't vary if you change drive controllers, swap into a new computer etc.
     
    Vassilis Kipouros likes this.
  8. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    895
    Likes Received:
    122
    Hello,

    I only can guess why your resilver was not ok. You must double chech how it was your fail disk - it has 512 B OR 4K. From your post I can see that you have a mixed storage with both cases (512/4K)
    If your failed disk was 512 and your new disks is 4k, as disk space is the same but as total number of sectors/disk is a BIG difference. zfs is working with sectors as you know. You start replacement and zfs will try to write N sectors (512) to the ne disk, and after some time zfs will see that your disk partition have only Y sectors (4k) but N < Y. And for this reason replacement process is failing.
     
  9. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    895
    Likes Received:
    122

    Also you can compare from your post that 2 different disks with 512 and 4k have a different numbers of sectors ;)
     
  10. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    The first disk replacement started giving out SMART errors during resilvering.
    And I just shut down the host, removed the disk and put in the new one...

    Which disks are you refering to?
    And how could this happen when all the disks where empty (no partitions) when the array was created?

    I guess that after I fix my current issue, I can remove the problematic disks, zero them out and reinsert them?
     
  11. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    895
    Likes Received:
    122
    Then try to duble check your sATA cables. Also You can try to switch the bad disk sATA port with other port where you allready have a good disk!
     
  12. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    Things are looking better...

    upload_2018-11-7_11-29-51.png
     
  13. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    895
    Likes Received:
    122
    What you have do?
     
  14. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    zpool replace POOLNAME ORIGINAL_DRIVE SECOND_NEW_DRIVE

    (check first line of screenshot)
     
  15. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    895
    Likes Received:
    122
    OK, I did not see it ;)
     
  16. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    And it's finished and fixed/healthy with no further intervention:

    upload_2018-11-9_10-17-2.png

    Thank you all for the support!
     
  17. Vassilis Kipouros

    Joined:
    Nov 2, 2016
    Messages:
    46
    Likes Received:
    3
    New issue...

    After another hard disk failure, I shutdown, replace faulty disk with new and after boot:

    root@pve:~# sgdisk -Z /dev/sdg
    GPT data structures destroyed! You may now partition the disk using fdisk or other utilities.
    root@pve:~# zpool replace -f tank 4952218975371802621 /dev/disk/by-id/ata-Hitachi_HDS721010CLA332_JP2940HZ3NEWRC
    cannot label 'sdg': try using parted(8) and then provide a specific slice: -3


    Any ideas?
     
  18. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,269
    Likes Received:
    117
    looks like this could help:
    https://github.com/zfsonlinux/zfs/issues/1028

    try `zpool labelclear` (or maybe only sgdisk /dev/sdg, could also work?)

    also check dmesg for any problems (and make sure you're using the correct drive)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  19. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    895
    Likes Received:
    122
    Hi,

    My own replacement hdd cook-book is like this:

    - take the new hdd and do some initial test (badblocks and smartd) for at least 24 h
    - after the test was ok then I go to replace the dead hdd with the new one

    In this mode I have big chanses to not replace a bad old hdd with a new bad disk ;)
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice