ZFS Issue

em3034

Member
Nov 11, 2021
6
9
8
47
Hello,
I have the following issue: I tried to replace an old 2TB nvme disk wich is hosting all my VMs.
I had a 4TB SATA drive which I added to my single ZFS pool. This worked fine, but I still had my pool in DEGRADED status.

I decided then to get another 4TB drive and add it as a replacement for the faulty nvme, so that I end up with a mirror of 2 x 4TB disks.

I ran the following:
Step1: Attache a new Disk to the ZFS pool:
Bash:
zpool attach storage nvme1n1 ata-CT4000MX500SSD1_2245E6836E78

Step2: Replace the vaulty nvme disk:
Bash:
zpool replace storage nvme1n1 ata-CT4000MX500SSD1_2247E689BA0E

After the resilvering my Proxmox system wouldn't start.
I ended up unplugging the 2 new disks and my system came back online again.
My pool is now has follow:

Bash:
zpool status -v storage
  pool: storage
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 0B in 00:27:02 with 5 errors on Fri Sep 29 19:52:30 2023
config:

    NAME                        STATE     READ WRITE CKSUM
    storage                     DEGRADED     0     0     0
      mirror-0                  DEGRADED    17     0     0
        replacing-0             DEGRADED    17     0     0
          nvme1n1               DEGRADED    18     0     5  too many errors
          13263678074942122905  UNAVAIL      0     0     0  was /dev/disk/by-id/ata-CT4000MX500SSD1_2247E689BA0E-part1
        6174270007834866697     UNAVAIL      0     0     0  was /dev/disk/by-id/ata-CT4000MX500SSD1_2245E6836E78-part1

errors: Permanent errors have been detected in the following files:

        storage/vms/vm-102-disk-1@__replicate_102-0_1670976008__:<0x1>
        storage/vms/vm-705-disk-0:<0x1>
        storage/vms/vm-504-disk-0@StartDemo:<0x1>
        storage/vms/vm-701-disk-0:<0x1>
root@pve:~#

What shall I do now?
- try to add the disks again to the pool? How can I cleanup the pool to start all over again?
- create a new pool and copy the data from the degraded pool? How would I do that?

Thanks a lot for your help,

Eric
 
errors: Permanent errors have been detected in the following files: storage/vms/vm-102-disk-1@__replicate_102-0_1670976008__:<0x1> storage/vms/vm-705-disk-0:<0x1> storage/vms/vm-504-disk-0@StartDemo:<0x1> storage/vms/vm-701-disk-0:<0x1>
You already got corrupted virtual disks and ZFS can't fix that because it was a single disk at the time of the corruption (so no parity data/mirrored data available to fix stuff...use a mirror in the future so this won't happen again). You can rescue the still healthy virtual disks but I would restore VM 102, 504, 701 and 705 anyway from backup.

And looks like you removed the disk while it was still resilvering.
create a new pool and copy the data from the degraded pool? How would I do that?
See "zfs send" and "zfs receive": https://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.html
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!