ZFS Pool Broken after Resilver

d4nnyb

Member
Aug 3, 2021
9
0
6
35
Hello,

My ZFS pool is online and mounted but if i try and access the mount my system hangs indefinitely.

Yesterday I realised that my pool was in a degraded state, this was due to one of my 2x 8TB HDDs (mirrored) being offline.
These are in a caddy and the caddy just needed turning back on for this Hdd.

I have tried to unmount and remount the datasets and it hangs and i have tried to scrub the pool which is currently at around 50% complete.


Code:
  pool: deadpool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub in progress since Thu Oct  3 07:49:33 2024
        3.22T / 6.52T scanned at 211M/s, 3.11T / 6.52T issued at 205M/s
        0B repaired, 47.79% done, 04:50:20 to go
config:

        NAME                                 STATE     READ WRITE CKSUM
        deadpool                             ONLINE       0     0     0
          mirror-0                           ONLINE       0     0     0
            ata-ST8000VN004-3CP101_WWZ3NZCZ  ONLINE       0     0     0
            ata-ST8000VN004-3CP101_WWZ3M08P  ONLINE       0     0     0

Any help you all can give me or anything further needed please ask.. and thanks in advance!
 
Hi,

I would wait the scrub to finish since it is about 50% done with an estimated 4+ hours remaining. during this process, ZFS maybe is checking for and attempting to repair any issues.

In that time you can check the dmesg or journalctl for anything interesting.
 
Hi,

I would wait the scrub to finish since it is about 50% done with an estimated 4+ hours remaining. during this process, ZFS maybe is checking for and attempting to repair any issues.

In that time you can check the dmesg or journalctl for anything interesting.


Hello Moayad,

Thank you for the reply.

It also hangs my system during reboot and currently the node is online but has question marks on everything on my cluster. (see attached)

Journalctl, dmesg, is there anything I can grep for?

Code:
Nov 23 21:58:41 pve3 kernel: DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000bd8000>Nov 23 21:58:41 pve3 kernel: DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000bd800000-0x00000000bffff>                             BIOS vendor: LENOVO; Ver: FWKT63A  ; Product Version: ThinkCentre M700
 

Attachments

  • Screenshot_20241003_124433_edit_3258196069797.jpg
    Screenshot_20241003_124433_edit_3258196069797.jpg
    55.5 KB · Views: 3
Last edited:
You can check the date when the system hangs and before a bit
Thank you, I have found this from another post. They said the txg_sync process being in a D (uninterruptible sleep) state is a clear sign that ZFS is stuck waiting on some I/O operation.

Am i being daft in that this is due to the scrubbing or is this related?

Code:
$ ps -aux | grep txg_sync
root         656  0.3  0.0      0     0 ?        D    07:37   1:25 [txg_sync]
danny      72000  0.0  0.0   6332  2176 pts/12   S+   13:52   0:00 grep txg_sync
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!