degraded pool after disk failure and replacement

chalan · Dec 9, 2018

after disk failure i have done this steps...

1.) zpool offline rpool /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
2.) From the WebUI, Servername -> Disks -> Initialize Disk with GPT (/dev/sdb)
3.) sgdisk --replicate=/dev/sdb /dev/sda
4.) sgdisk --randomize-guids /dev/sdb
5.) grub-install /dev/sdb
6.) zpool replace rpool 12706416511818272176 /dev/disk/by-id/wwn-0x5000cca269e871c7-part

but it ended with error and my rpool is still degraded, see

Code:

root@pve-klenova:~# zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 1,09T in 7h33m with 2 errors on Wed Dec  5 19:51:37 2018
config:

    NAME                                STATE     READ WRITE CKSUM
    rpool                               DEGRADED     0     0     2
      mirror-0                          DEGRADED     0     0     4
        wwn-0x5000cca25cc933fe-part2    ONLINE       0     0     4
        replacing-1                     DEGRADED  1008     0     0
          12706416511818272176          OFFLINE      0     0     0  was /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
          wwn-0x5000cca269e871c7-part2  ONLINE       0     0  1008

errors: Permanent errors have been detected in the following files:

        //var/lib/vz/images/200/vm-200-disk-2.qcow2

can somebody PLEASE help me?

jim.bond.9862 · Dec 10, 2018

First of all since this is your main pool, you shouldn't take it offline.
Second if I remember correctly you shoild do step 1 and 2. than replace . once it is resilvered and all than do grub install.
Also in your case try running scrub.

chalan · Dec 12, 2018

after i ran scrub

Code:

root@pve-klenova:~# zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 14h43m with 2 errors on Mon Dec 10 23:04:43 2018
config:

        NAME                                STATE     READ WRITE CKSUM
        rpool                               DEGRADED     0     0    14
          mirror-0                          DEGRADED     0     0    28
            wwn-0x5000cca25cc933fe-part2    ONLINE       0     0    28
            replacing-1                     DEGRADED 1.01K     0     0
              12706416511818272176          OFFLINE      0     0     0  was /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
              wwn-0x5000cca269e871c7-part2  ONLINE       0     0 1.01K

errors: Permanent errors have been detected in the following files:

        //var/lib/vz/images/200/vm-200-disk-2.qcow2

so what can i do now? should i somehow remove the /dev/sdb and the missing offline drive and after that add /sdb again? or i have to reinstall? im helpless...

chalan · Dec 12, 2018

i dont understand the error, i have checked the qcow2 file

Code:

root@pve-klenova:~# qemu-img check /var/lib/vz/images/200/vm-200-disk-2.qcow2
No errors were found on the image.
16777216/16777216 = 100.00% allocated, 0.04% fragmented, 0.00% compressed clusters
Image end offset: 1102665351168

no errors found, so why zfs show error? what can i do now?

jim.bond.9862 · Dec 12, 2018

Not sure if it make sense, But if it was me, I would do several things asap.
1. Do full backup of each vm
2. Do backup of all config files.
3. Do backup of all system config.
4. Move all of the backups off the machine. An external drive. Or plugin an extra internal drive if possible.
Copy all of it off.
5. Than make an extra copy of the file in question.

6. Delete the file.
7. Run a scrub again.
If all is well it should finish/fix the drive replacement and bring pool online.
Do tests trying booting on each drive.
I still not sure how to do this safely so do your research. Maybe someone here can chime in with help.
Once you up. Copy the image back and try starting the vm.
8. Do scrub again.

chalan · Dec 30, 2018

things cat more complicated.... i didn't made the backups (ordered new 4TB usb drive, but still dont have it), i needed the server to reboot and ended with grub unknown fille system... i try to boot from both disks but no luck... i removed the new one and try to boot with the old, which was working well but also no luck... what can i do now?

i have booted pve install from usb, and choose rescue but it ended with error... so i try the debug mode, but zfs or zpool commands are not know...

i NEED to rescue the data, eventualy to boot again and to make backups to external drive... how can i fix the grub problem?

PLEASE HELP ME...

sb-jw · Dec 30, 2018

Can you give us more information or Screenshots to see what do you mean / what happened?

chalan · Dec 30, 2018

ok after ctrl+D and abort instalation i was able to end in prompt with zpool command, BUT

zpool import -a

cannot import 'rpool' : no such pool or dateset
Destroy and re-create the pool from
a backup source.

is this serious? how could i damage the rpool????

what can i do NOW?

chalan · Jan 11, 2019

ok so i try to zpool import but it complains that there is a damaged file... i dont care about the file i just need to import that pool again, install grub and boot so i can can rescue as many datas as posible... how can i do this???

Search

Search

degraded pool after disk failure and replacement

chalan

Member

jim.bond.9862

Renowned Member

chalan

Member

chalan

Member

jim.bond.9862

Renowned Member

chalan

Member

sb-jw

Famous Member

chalan

Member

chalan

Member

We value your privacy