degraded pool after disk failure and replacement

Discussion in 'Proxmox VE: Installation and configuration' started by chalan, Dec 9, 2018.

  1. chalan

    chalan Member

    Joined:
    Mar 16, 2015
    Messages:
    119
    Likes Received:
    1
    after disk failure i have done this steps...

    1.) zpool offline rpool /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
    2.) From the WebUI, Servername -> Disks -> Initialize Disk with GPT (/dev/sdb)
    3.) sgdisk --replicate=/dev/sdb /dev/sda
    4.) sgdisk --randomize-guids /dev/sdb
    5.) grub-install /dev/sdb
    6.) zpool replace rpool 12706416511818272176 /dev/disk/by-id/wwn-0x5000cca269e871c7-part

    but it ended with error and my rpool is still degraded, see

    Code:
    root@pve-klenova:~# zpool status -v
      pool: rpool
     state: DEGRADED
    status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
    action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
       see: http://zfsonlinux.org/msg/ZFS-8000-8A
      scan: resilvered 1,09T in 7h33m with 2 errors on Wed Dec  5 19:51:37 2018
    config:
    
        NAME                                STATE     READ WRITE CKSUM
        rpool                               DEGRADED     0     0     2
          mirror-0                          DEGRADED     0     0     4
            wwn-0x5000cca25cc933fe-part2    ONLINE       0     0     4
            replacing-1                     DEGRADED  1008     0     0
              12706416511818272176          OFFLINE      0     0     0  was /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
              wwn-0x5000cca269e871c7-part2  ONLINE       0     0  1008
    
    errors: Permanent errors have been detected in the following files:
    
            //var/lib/vz/images/200/vm-200-disk-2.qcow2
    
    can somebody PLEASE help me?
     
  2. jim.bond.9862

    jim.bond.9862 Active Member

    Joined:
    Apr 17, 2015
    Messages:
    322
    Likes Received:
    27
    First of all since this is your main pool, you shouldn't take it offline.
    Second if I remember correctly you shoild do step 1 and 2. than replace . once it is resilvered and all than do grub install.
    Also in your case try running scrub.
     
  3. chalan

    chalan Member

    Joined:
    Mar 16, 2015
    Messages:
    119
    Likes Received:
    1
    after i ran scrub

    Code:
    root@pve-klenova:~# zpool status -v
      pool: rpool
     state: DEGRADED
    status: One or more devices has experienced an error resulting in data
            corruption.  Applications may be affected.
    action: Restore the file in question if possible.  Otherwise restore the
            entire pool from backup.
       see: http://zfsonlinux.org/msg/ZFS-8000-8A
      scan: scrub repaired 0B in 14h43m with 2 errors on Mon Dec 10 23:04:43 2018
    config:
    
            NAME                                STATE     READ WRITE CKSUM
            rpool                               DEGRADED     0     0    14
              mirror-0                          DEGRADED     0     0    28
                wwn-0x5000cca25cc933fe-part2    ONLINE       0     0    28
                replacing-1                     DEGRADED 1.01K     0     0
                  12706416511818272176          OFFLINE      0     0     0  was /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
                  wwn-0x5000cca269e871c7-part2  ONLINE       0     0 1.01K
    
    errors: Permanent errors have been detected in the following files:
    
            //var/lib/vz/images/200/vm-200-disk-2.qcow2
    
    so what can i do now? should i somehow remove the /dev/sdb and the missing offline drive and after that add /sdb again? or i have to reinstall? im helpless...
     
  4. chalan

    chalan Member

    Joined:
    Mar 16, 2015
    Messages:
    119
    Likes Received:
    1
    i dont understand the error, i have checked the qcow2 file

    Code:
    root@pve-klenova:~# qemu-img check /var/lib/vz/images/200/vm-200-disk-2.qcow2
    No errors were found on the image.
    16777216/16777216 = 100.00% allocated, 0.04% fragmented, 0.00% compressed clusters
    Image end offset: 1102665351168
    no errors found, so why zfs show error? what can i do now?
     
  5. jim.bond.9862

    jim.bond.9862 Active Member

    Joined:
    Apr 17, 2015
    Messages:
    322
    Likes Received:
    27
    Not sure if it make sense, But if it was me, I would do several things asap.
    1. Do full backup of each vm
    2. Do backup of all config files.
    3. Do backup of all system config.
    4. Move all of the backups off the machine. An external drive. Or plugin an extra internal drive if possible.
    Copy all of it off.
    5. Than make an extra copy of the file in question.

    6. Delete the file.
    7. Run a scrub again.
    If all is well it should finish/fix the drive replacement and bring pool online.
    Do tests trying booting on each drive.
    I still not sure how to do this safely so do your research. Maybe someone here can chime in with help.
    Once you up. Copy the image back and try starting the vm.
    8. Do scrub again.
     
  6. chalan

    chalan Member

    Joined:
    Mar 16, 2015
    Messages:
    119
    Likes Received:
    1
    things cat more complicated.... i didn't made the backups (ordered new 4TB usb drive, but still dont have it), i needed the server to reboot and ended with grub unknown fille system... i try to boot from both disks but no luck... i removed the new one and try to boot with the old, which was working well but also no luck... what can i do now?

    i have booted pve install from usb, and choose rescue but it ended with error... so i try the debug mode, but zfs or zpool commands are not know...

    i NEED to rescue the data, eventualy to boot again and to make backups to external drive... how can i fix the grub problem?

    PLEASE HELP ME...
     
  7. sb-jw

    sb-jw Member

    Joined:
    Jan 23, 2018
    Messages:
    261
    Likes Received:
    23
    Can you give us more information or Screenshots to see what do you mean / what happened?
     
  8. chalan

    chalan Member

    Joined:
    Mar 16, 2015
    Messages:
    119
    Likes Received:
    1
    ok after ctrl+D and abort instalation i was able to end in prompt with zpool command, BUT

    zpool import -a

    cannot import 'rpool' : no such pool or dateset
    Destroy and re-create the pool from
    a backup source.

    is this serious? how could i damage the rpool????

    what can i do NOW?
     
  9. chalan

    chalan Member

    Joined:
    Mar 16, 2015
    Messages:
    119
    Likes Received:
    1
    ok so i try to zpool import but it complains that there is a damaged file... i dont care about the file i just need to import that pool again, install grub and boot so i can can rescue as many datas as posible... how can i do this???

    [​IMG]
     
    #9 chalan, Jan 11, 2019
    Last edited: Jan 11, 2019
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice