degraded pool after disk failure and replacement

chalan

Member
Mar 16, 2015
119
3
16
after disk failure i have done this steps...

1.) zpool offline rpool /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
2.) From the WebUI, Servername -> Disks -> Initialize Disk with GPT (/dev/sdb)
3.) sgdisk --replicate=/dev/sdb /dev/sda
4.) sgdisk --randomize-guids /dev/sdb
5.) grub-install /dev/sdb
6.) zpool replace rpool 12706416511818272176 /dev/disk/by-id/wwn-0x5000cca269e871c7-part

but it ended with error and my rpool is still degraded, see

Code:
root@pve-klenova:~# zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 1,09T in 7h33m with 2 errors on Wed Dec  5 19:51:37 2018
config:

    NAME                                STATE     READ WRITE CKSUM
    rpool                               DEGRADED     0     0     2
      mirror-0                          DEGRADED     0     0     4
        wwn-0x5000cca25cc933fe-part2    ONLINE       0     0     4
        replacing-1                     DEGRADED  1008     0     0
          12706416511818272176          OFFLINE      0     0     0  was /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
          wwn-0x5000cca269e871c7-part2  ONLINE       0     0  1008

errors: Permanent errors have been detected in the following files:

        //var/lib/vz/images/200/vm-200-disk-2.qcow2

can somebody PLEASE help me?
 
First of all since this is your main pool, you shouldn't take it offline.
Second if I remember correctly you shoild do step 1 and 2. than replace . once it is resilvered and all than do grub install.
Also in your case try running scrub.
 
after i ran scrub

Code:
root@pve-klenova:~# zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 14h43m with 2 errors on Mon Dec 10 23:04:43 2018
config:

        NAME                                STATE     READ WRITE CKSUM
        rpool                               DEGRADED     0     0    14
          mirror-0                          DEGRADED     0     0    28
            wwn-0x5000cca25cc933fe-part2    ONLINE       0     0    28
            replacing-1                     DEGRADED 1.01K     0     0
              12706416511818272176          OFFLINE      0     0     0  was /dev/disk/by-id/wwn-0x5000cca269c4bd82-part2
              wwn-0x5000cca269e871c7-part2  ONLINE       0     0 1.01K

errors: Permanent errors have been detected in the following files:

        //var/lib/vz/images/200/vm-200-disk-2.qcow2

so what can i do now? should i somehow remove the /dev/sdb and the missing offline drive and after that add /sdb again? or i have to reinstall? im helpless...
 
i dont understand the error, i have checked the qcow2 file

Code:
root@pve-klenova:~# qemu-img check /var/lib/vz/images/200/vm-200-disk-2.qcow2
No errors were found on the image.
16777216/16777216 = 100.00% allocated, 0.04% fragmented, 0.00% compressed clusters
Image end offset: 1102665351168

no errors found, so why zfs show error? what can i do now?
 
Not sure if it make sense, But if it was me, I would do several things asap.
1. Do full backup of each vm
2. Do backup of all config files.
3. Do backup of all system config.
4. Move all of the backups off the machine. An external drive. Or plugin an extra internal drive if possible.
Copy all of it off.
5. Than make an extra copy of the file in question.

6. Delete the file.
7. Run a scrub again.
If all is well it should finish/fix the drive replacement and bring pool online.
Do tests trying booting on each drive.
I still not sure how to do this safely so do your research. Maybe someone here can chime in with help.
Once you up. Copy the image back and try starting the vm.
8. Do scrub again.
 
things cat more complicated.... i didn't made the backups (ordered new 4TB usb drive, but still dont have it), i needed the server to reboot and ended with grub unknown fille system... i try to boot from both disks but no luck... i removed the new one and try to boot with the old, which was working well but also no luck... what can i do now?

i have booted pve install from usb, and choose rescue but it ended with error... so i try the debug mode, but zfs or zpool commands are not know...

i NEED to rescue the data, eventualy to boot again and to make backups to external drive... how can i fix the grub problem?

PLEASE HELP ME...
 
Can you give us more information or Screenshots to see what do you mean / what happened?
 
ok after ctrl+D and abort instalation i was able to end in prompt with zpool command, BUT

zpool import -a

cannot import 'rpool' : no such pool or dateset
Destroy and re-create the pool from
a backup source.

is this serious? how could i damage the rpool????

what can i do NOW?
 
ok so i try to zpool import but it complains that there is a damaged file... i dont care about the file i just need to import that pool again, install grub and boot so i can can rescue as many datas as posible... how can i do this???

proxmox_rescue.png
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!