ZFS Issue

em3034

Member
Nov 11, 2021
11
10
8
47
Hello,
I have the following issue: I tried to replace an old 2TB nvme disk wich is hosting all my VMs.
I had a 4TB SATA drive which I added to my single ZFS pool. This worked fine, but I still had my pool in DEGRADED status.

I decided then to get another 4TB drive and add it as a replacement for the faulty nvme, so that I end up with a mirror of 2 x 4TB disks.

I ran the following:
Step1: Attache a new Disk to the ZFS pool:
Bash:
zpool attach storage nvme1n1 ata-CT4000MX500SSD1_2245E6836E78

Step2: Replace the vaulty nvme disk:
Bash:
zpool replace storage nvme1n1 ata-CT4000MX500SSD1_2247E689BA0E

After the resilvering my Proxmox system wouldn't start.
I ended up unplugging the 2 new disks and my system came back online again.
My pool is now has follow:

Bash:
zpool status -v storage
  pool: storage
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 0B in 00:27:02 with 5 errors on Fri Sep 29 19:52:30 2023
config:

    NAME                        STATE     READ WRITE CKSUM
    storage                     DEGRADED     0     0     0
      mirror-0                  DEGRADED    17     0     0
        replacing-0             DEGRADED    17     0     0
          nvme1n1               DEGRADED    18     0     5  too many errors
          13263678074942122905  UNAVAIL      0     0     0  was /dev/disk/by-id/ata-CT4000MX500SSD1_2247E689BA0E-part1
        6174270007834866697     UNAVAIL      0     0     0  was /dev/disk/by-id/ata-CT4000MX500SSD1_2245E6836E78-part1

errors: Permanent errors have been detected in the following files:

        storage/vms/vm-102-disk-1@__replicate_102-0_1670976008__:<0x1>
        storage/vms/vm-705-disk-0:<0x1>
        storage/vms/vm-504-disk-0@StartDemo:<0x1>
        storage/vms/vm-701-disk-0:<0x1>
root@pve:~#

What shall I do now?
- try to add the disks again to the pool? How can I cleanup the pool to start all over again?
- create a new pool and copy the data from the degraded pool? How would I do that?

Thanks a lot for your help,

Eric
 
errors: Permanent errors have been detected in the following files: storage/vms/vm-102-disk-1@__replicate_102-0_1670976008__:<0x1> storage/vms/vm-705-disk-0:<0x1> storage/vms/vm-504-disk-0@StartDemo:<0x1> storage/vms/vm-701-disk-0:<0x1>
You already got corrupted virtual disks and ZFS can't fix that because it was a single disk at the time of the corruption (so no parity data/mirrored data available to fix stuff...use a mirror in the future so this won't happen again). You can rescue the still healthy virtual disks but I would restore VM 102, 504, 701 and 705 anyway from backup.

And looks like you removed the disk while it was still resilvering.
create a new pool and copy the data from the degraded pool? How would I do that?
See "zfs send" and "zfs receive": https://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.html
 
Last edited: