Disk failed ?

Jordan67

New Member
Feb 16, 2024
9
0
1
Hello

I have a ZFS cluster on my Proxmox server. This morning during my visual inspection of the servers, two hard drives showed red in the bay.

code_language.shell:
root@PVE-1:~# zpool status
  pool: klustersynced_zfs
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 00:37:32 with 0 errors on Sun Feb 11 01:01:33 2024
config:


    NAME                      STATE     READ WRITE CKSUM
    klustersynced_zfs         DEGRADED     0     0     0
      mirror-0                DEGRADED     0     0     0
        sdb                   ONLINE       0     0     0
        12882363701823177512  FAULTED      0     0     0  was /dev/sdc1
      mirror-1                DEGRADED     0     0     0
        16188092585002326977  FAULTED      0     0     0  was /dev/sdd1
        sde                   ONLINE       0     0     0
      sdg                     ONLINE       0     0     0
      mirror-3                ONLINE       0     0     0
        sdh                   ONLINE       0     0     0
        sdi                   ONLINE       0     0     0
      sdf                     ONLINE       0     0     0



Indeed ZFS sees the two disks as being out of service but... when I do a SMART test (short) on the disks, I have no error



code_language.shell:
root@PVE-1:~# smartctl -a /dev/sdc1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-8-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 PRO 1TB
Serial Number:    S42NNF0KC00967A
LU WWN Device Id: 5 002538 e40ad6c84
Firmware Version: RVM01B6Q
User Capacity:    1 024 209 543 168 bytes [1,02 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb 16 10:23:32 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x53) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  85) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.


SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       43100
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       10
177 Wear_Leveling_Count     0x0013   063   063   000    Pre-fail  Always       -       814
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   079   059   000    Old_age   Always       -       21
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       8
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       356855281888


SMART Error Log Version: 1
No Errors Logged


SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     43100         -
# 2  Short offline       Completed without error       00%     43100         -


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


code_language.shell:
root@PVE-1:~# smartctl -a /dev/sdd
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-8-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 PRO 1TB
Serial Number:    S42NNF0KC00937P
LU WWN Device Id: 5 002538 e40ad6c0c
Firmware Version: RVM01B6Q
User Capacity:    1 024 209 543 168 bytes [1,02 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb 16 10:29:32 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x53) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  85) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.


SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       43100
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       10
177 Wear_Leveling_Count     0x0013   061   061   000    Pre-fail  Always       -       862
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   075   059   000    Old_age   Always       -       25
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       8
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       378554995800


SMART Error Log Version: 1
No Errors Logged


SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     43100         -
# 2  Short offline       Completed without error       00%     43100         -


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I downloaded the test tool for Samsung hard drives, but it tells me that the condition of the drives is good...
code_language.shell:
root@PVE-1:~# ./Samsung_SSD_DC_Toolkit_for_Linux_V2.1 -L
================================================================================================
Samsung DC Toolkit Version 2.1.L.Q.0
Copyright (C) 2017 SAMSUNG Electronics Co. Ltd. All rights reserved.
================================================================================================


----------------------------------------------------------------------------------------------------------------------------------------
| Disk   | Path     | Model                     | Serial          | Firmware | Optionrom | Capacity | Drive  | Total Bytes | NVMe Driver |
| Number |          |                           | Number          |          | Version   |          | Health | Written     |             |
----------------------------------------------------------------------------------------------------------------------------------------
| 0      | /dev/sda | Samsung SSD 860 PRO 256GB | S42VNF0M208207J | RVM01B6Q | N/A       |   238 GB | GOOD   | 7.54 TB     | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 1      | /dev/sdb | Samsung SSD 860 PRO 1TB   | S42NNF0KC00975M | RVM01B6Q | N/A       |   953 GB | GOOD   | 176.32 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 2      | /dev/sdc | Samsung SSD 860 PRO 1TB   | S42NNF0KC00967A | RVM01B6Q | N/A       |   953 GB | GOOD   | 166.17 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 3      | /dev/sdd | Samsung SSD 860 PRO 1TB   | S42NNF0KC00937P | RVM01B6Q | N/A       |   953 GB | GOOD   | 176.28 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 4      | /dev/sde | Samsung SSD 860 PRO 1TB   | S42NNF0KC00972K | RVM01B6Q | N/A       |   953 GB | GOOD   | 166.21 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 5      | /dev/sdf | Samsung SSD 860 PRO 1TB   | S42NNF0KC00961V | RVM01B6Q | N/A       |   953 GB | GOOD   | 173.01 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 6      | /dev/sdg | Samsung SSD 860 PRO 1TB   | S42NNF0KC00944L | RVM01B6Q | N/A       |   953 GB | GOOD   | 172.03 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 7      | /dev/sdh | Samsung SSD 860 PRO 1TB   | S42NNF0KC00948E | RVM01B6Q | N/A       |   953 GB | GOOD   | 187.82 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 8      | /dev/sdi | Samsung SSD 860 PRO 1TB   | S42NNF0KC00980R | RVM01B6Q | N/A       |   953 GB | GOOD   | 187.82 TB   | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 9      | /dev/sdj | Samsung SSD 860 PRO 1TB   | S42NNF0KC00977F | RVM01B6Q | N/A       |   953 GB | GOOD   | 41.02 TB    | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 10     | /dev/sdk | Samsung SSD 860 PRO 1TB   | S42NNF0KC00950H | RVM01B6Q | N/A       |   953 GB | GOOD   | 41.96 TB    | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------
| 11     | /dev/sdl | Samsung SSD 860 PRO 256GB | S42VNF0M208196W | RVM01B6Q | N/A       |   238 GB | GOOD   | 7.52 TB     | N/A         |
----------------------------------------------------------------------------------------------------------------------------------------


Do you have any advice for investigating this problem?

THANKS
 
Last edited:
I hope you are aware that this isn't a raid10? Its a raid0 as sdf and sdg aren't mirrored. In case sdf or sdg fail all data will be lost!
 
Last edited:
I just arrived in the company a few days ago in firefighter mode but looking more closely at this cluster scares me. Luckily I have my backups.

What is the best option? Completely destroy the cluster and start again from my backups or can I add sdf and sdg in a raid1?

For the two drives in red, is this a false alarm?
 
Completely destroy the cluster and start again from my backups or can I add sdf and sdg in a raid1?
It would be possible to remove sdg and use it to mirror sdf...in case you got enough free space on that pool. See:
https://openzfs.github.io/openzfs-docs/man/master/8/zpool-remove.8.html
https://openzfs.github.io/openzfs-docs/man/master/8/zpool-attach.8.html

By the way...it's highly recommended to only use Enterprise/Datacenter grade SSDs with power-loss protection with ZFS as wear will be high and performance terrible whenever doing sync writes as these can't be cached by the SSDs DRAM cache. And those Samsung Pros are not...
 
Hello all :)

this weekend I completely rebooted the machine, and since then the hard drives displayed as failed are now functional (and the LED on the disk is no longer red)... Strange...

I took the opportunity to add two SSDs and add a RAID mirror to the pool.


code_language.shell:
root@PVE-1:~# zpool status klustersynced_zfs
  pool: klustersynced_zfs
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: resilvered 4.04G in 00:01:50 with 0 errors on Fri Feb 16 16:15:15 2024
config:


    NAME               STATE     READ WRITE CKSUM
    klustersynced_zfs  ONLINE       0     0     0
      mirror-0         ONLINE       0     0     0
        sdb            ONLINE       0     0     0
        sdc            ONLINE       0     0     3
      mirror-1         ONLINE       0     0     0
        sdd            ONLINE       0     0     0
        sde            ONLINE       0     0     0
      sdg              ONLINE       0     0     0
      mirror-3         ONLINE       0     0     0
        sdh            ONLINE       0     0     0
        sdi            ONLINE       0     0     0
      sdf              ONLINE       0     0     0
      mirror-5         ONLINE       0     0     0
        sdn            ONLINE       0     0     0
        sdm            ONLINE       0     0     0



Could you confirm that I can remove the sdf and sdg disks with the command:

code_language.shell:
zpool remove klustersynced_zfs /dev/sdg
zpool remove klustersynced_zfs /dev/sdf

Can I do this without destroying my pool and without losing my data?

According to the doc the zpool remove command makes a copy of the data to another disk in the pool then detaches the disk. Am I all right?

Thanks all :)
 
I tried the command with various syntax but it does not work.

code_language.shell:
root@PVE-1:~# zpool remove klustersynced_zfs /dev/sdg
cannot remove /dev/sdg: operation not supported on this type of pool
root@PVE-1:~# zpool remove klustersynced_zfs sdg
cannot remove sdg: operation not supported on this type of pool
root@PVE-1:~# zpool remove klustersynced_zfs /dev/sdg/
cannot remove /dev/sdg/: no such device in pool
 
Did you...?
- check the sector size
- check encryption status
- check if enough free space is available?
- try to fix the data corruption (3 checksum errors) by running a scrub first in case ZFS doesn't like to allow such operation on an already damaged pool?
zfs list -o name,keystatus as well as zpool list -v and fdisk -l?
 
Last edited:
Hi

code_language.shell:
root@PVE-1:~# zfs list -o name,keystatus
NAME                                                          KEYSTATUS
klustersynced_zfs                                             -
klustersynced_zfs/vm-100-disk-0                               -
klustersynced_zfs/vm-101-disk-0                               -
klustersynced_zfs/vm-101-state-deb9                           -
klustersynced_zfs/vm-102-disk-0                               -
klustersynced_zfs/vm-104-disk-0                               -
klustersynced_zfs/vm-106-disk-0                               -
klustersynced_zfs/vm-107-disk-0                               -
klustersynced_zfs/vm-108-disk-0                               -
klustersynced_zfs/vm-108-state-av_correction_pb_sophie_so125  -
klustersynced_zfs/vm-109-disk-0                               -
klustersynced_zfs/vm-110-disk-0                               -
klustersynced_zfs/vm-110-state-configok_fresh                 -
klustersynced_zfs/vm-111-disk-0                               -
klustersynced_zfs/vm-112-disk-0                               -
klustersynced_zfs/vm-114-disk-0                               -
klustersynced_zfs/vm-114-disk-1                               -
klustersynced_zfs/vm-117-disk-0                               -
klustersynced_zfs/vm-117-state-avant_upgrade                  -
klustersynced_zfs/vm-119-disk-0                               -
klustersynced_zfs/vm-119-state-av_up_deb11                    -
klustersynced_zfs/vm-120-disk-0                               -
klustersynced_zfs/vm-120-state-fresh                          -
klustersynced_zfs/vm-121-disk-0                               -
klustersynced_zfs/vm-122-disk-0                               -
klustersynced_zfs/vm-122-state-fresh_updated_install          -
local_zfs                                                     -
local_zfs/vm-105-disk-0                                       -


code_language.shell:
root@PVE-1:~# zpool list -v
NAME                SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
klustersynced_zfs  5.55T  2.07T  3.48T        -         -    42%    37%  1.00x    ONLINE  -
  mirror-0          952G   505G   447G        -         -    58%  53.0%      -    ONLINE
    sdb             954G      -      -        -         -      -      -      -    ONLINE
    sdc             954G      -      -        -         -      -      -      -    ONLINE
  mirror-1          952G   511G   441G        -         -    57%  53.6%      -    ONLINE
    sdd             954G      -      -        -         -      -      -      -    ONLINE
    sde             954G      -      -        -         -      -      -      -    ONLINE
  sdg               954G   520G   432G        -         -    59%  54.6%      -    ONLINE
  mirror-3          952G   495G   457G        -         -    59%  52.0%      -    ONLINE
    sdh             954G      -      -        -         -      -      -      -    ONLINE
    sdi             954G      -      -        -         -      -      -      -    ONLINE
  sdf               954G  91.1G   861G        -         -    19%  9.57%      -    ONLINE
  mirror-5          928G  3.31G   925G        -         -     0%  0.35%      -    ONLINE
    sdn             932G      -      -        -         -      -      -      -    ONLINE
    sdm             932G      -      -        -         -      -      -      -    ONLINE
local_zfs           952G  35.3G   917G        -         -    32%     3%  1.00x    ONLINE  -
  mirror-0          952G  35.3G   917G        -         -    32%  3.70%      -    ONLINE
    sdj             954G      -      -        -         -      -      -      -    ONLINE
    sdk             954G      -      -        -         -      -      -      -    ONLINE
 
Code:
root@PVE-1:~# fdisk -l
Disk /dev/sdl: 238,47 GiB, 256060514304 bytes, 500118192 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 17A6C066-4CBE-45AC-84BB-B637FF7E31DC

Device       Start       End   Sectors  Size Type
/dev/sdl1       34      2047      2014 1007K BIOS boot
/dev/sdl2     2048   1050623   1048576  512M EFI System
/dev/sdl3  1050624 500118158 499067535  238G Linux RAID


Disk /dev/sda: 238,47 GiB, 256060514304 bytes, 500118192 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 17A6C066-4CBE-45AC-84BB-B637FF7E31DC

Device       Start       End   Sectors  Size Type
/dev/sda1       34      2047      2014 1007K BIOS boot
/dev/sda2     2048   1050623   1048576  512M EFI System
/dev/sda3  1050624 500118158 499067535  238G Linux RAID


Disk /dev/sdb: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2F14AB6D-A534-E14B-9EF4-C9E818E548E3

Device          Start        End    Sectors   Size Type
/dev/sdb1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdb9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdc: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: C6415FED-9756-C34A-BDF0-564A67074907

Device          Start        End    Sectors   Size Type
/dev/sdc1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdc9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdd: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B0F962D5-0BCA-5545-805F-E8FC4F2157F1

Device          Start        End    Sectors   Size Type
/dev/sdd1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdd9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sde: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B9ABE7D5-FA83-7843-81B0-C32E4EDFCEEE

Device          Start        End    Sectors   Size Type
/dev/sde1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sde9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdf: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: D24DC012-0555-624E-A76E-28C979F87E70

Device          Start        End    Sectors   Size Type
/dev/sdf1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdf9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdg: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 6BCE90CD-1453-7D43-A422-EFC5B8310D84

Device          Start        End    Sectors   Size Type
/dev/sdg1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdg9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdh: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 250FC80E-0A04-554B-9628-D3CFE332EEF2

Device          Start        End    Sectors   Size Type
/dev/sdh1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdh9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdi: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 015E283C-D0E1-374F-9195-C1C0F13C0F3D

Device          Start        End    Sectors   Size Type
/dev/sdi1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdi9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdk: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DFA10441-127E-5046-B871-A0D762F477C8

Device          Start        End    Sectors   Size Type
/dev/sdk1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdk9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/md0: 237,85 GiB, 255388352512 bytes, 498805376 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 59,25 GiB, 63619203072 bytes, 124256256 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/zd0: 7 GiB, 7516192768 bytes, 14680064 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd16: 1001 GiB, 1074815565824 bytes, 2099249152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd32: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd48: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd64: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: A4A66241-1AA7-4604-859D-43B81E752484

Device      Start      End  Sectors Size Type
/dev/zd64p1  2048     4095     2048   1M BIOS boot
/dev/zd64p2  4096 67106815 67102720  32G Linux filesystem


Disk /dev/zd80: 16 GiB, 17179869184 bytes, 33554432 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: 1F084A0E-F566-4D70-A37E-1E0B1356CBE9

Device      Start      End  Sectors Size Type
/dev/zd80p1  2048     4095     2048   1M BIOS boot
/dev/zd80p2  4096 33552383 33548288  16G Linux filesystem


Disk /dev/zd96: 5 GiB, 5368709120 bytes, 10485760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd112: 2,49 GiB, 2671771648 bytes, 5218304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd128: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x09678c97

Device       Boot Start      End  Sectors Size Id Type
/dev/zd128p1 *     2048 31457279 31455232  15G 83 Linux


Disk /dev/zd144: 16 GiB, 17179869184 bytes, 33554432 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x703819f2

Device       Boot    Start      End  Sectors  Size Id Type
/dev/zd144p1 *        2048 26386431 26384384 12,6G 83 Linux
/dev/zd144p2      26388478 33552383  7163906  3,4G  5 Extended
/dev/zd144p5      26388480 33552383  7163904  3,4G 82 Linux swap / Solaris

Partition 2 does not start on physical sector boundary.


Disk /dev/zd160: 64 GiB, 68719476736 bytes, 134217728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0xb3334e87

Device       Boot     Start       End   Sectors  Size Id Type
/dev/zd160p1 *         2048 128906250 128904203 61,5G 83 Linux
/dev/zd160p2      128907264 134217727   5310464  2,5G 82 Linux swap / Solaris


Disk /dev/zd176: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd192: 1,49 GiB, 1598029824 bytes, 3121152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd208: 80 GiB, 85899345920 bytes, 167772160 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x25df02be

Device       Boot     Start       End   Sectors  Size Id Type
/dev/zd208p1 *         2048 164062500 164060453 78,2G 83 Linux
/dev/zd208p2      164063232 167772159   3708928  1,8G 83 Linux


Disk /dev/zd224: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: 7C574187-E4AB-4F64-ACEF-29877C347B0C

Device       Start      End  Sectors Size Type
/dev/zd224p1  2048     4095     2048   1M BIOS boot
/dev/zd224p2  4096 67106815 67102720  32G Linux filesystem


Disk /dev/zd240: 7,32 GiB, 7864320000 bytes, 15360000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd256: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: C6F21D13-CC74-4F41-80D9-EBE8F4F788A0

Device       Start      End  Sectors Size Type
/dev/zd256p1  2048     4095     2048   1M BIOS boot
/dev/zd256p2  4096 67106815 67102720  32G Linux filesystem


Disk /dev/zd272: 1,01 TiB, 1105954078720 bytes, 2160066560 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x4c6c5068

Device       Boot Start        End    Sectors Size Id Type
/dev/zd272p1       2048 2160064511 2160062464   1T 83 Linux


Disk /dev/zd288: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd304: 16,49 GiB, 17704157184 bytes, 34578432 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd320: 4,49 GiB, 4819255296 bytes, 9412608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd336: 2,49 GiB, 2671771648 bytes, 5218304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd368: 2,49 GiB, 2671771648 bytes, 5218304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd384: 42 GiB, 45097156608 bytes, 88080384 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd400: 50 GiB, 53687091200 bytes, 104857600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/zd352: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 16384 bytes
I/O size (minimum/optimal): 16384 bytes / 16384 bytes
Disklabel type: dos
Disk identifier: 0x00000000


Disk /dev/sdj: 953,87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 63A164E3-31D1-6E40-B9B4-A7FF08FF751B

Device          Start        End    Sectors   Size Type
/dev/sdj1        2048 2000392191 2000390144 953,9G Solaris /usr & Apple ZFS
/dev/sdj9  2000392192 2000408575      16384     8M Solaris reserved 1


Disk /dev/sdm: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000BX500SSD1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E274F4A2-731B-8246-A366-04AB3DAC865A

Device          Start        End    Sectors   Size Type
/dev/sdm1        2048 1953507327 1953505280 931,5G Solaris /usr & Apple ZFS
/dev/sdm9  1953507328 1953523711      16384     8M Solaris reserved 1


Disk /dev/sdn: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000BX500SSD1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5D0AE96F-FCD5-934F-9AE5-12B6A4A05403

Device          Start        End    Sectors   Size Type
/dev/sdn1        2048 1953507327 1953505280 931,5G Solaris /usr & Apple ZFS
/dev/sdn9  1953507328 1953523711      16384     8M Solaris reserved 1

Thanks you all :)
 
All disks use 512B/512B sectors, no raidz1/2/3 in use, no encryption...so no idea why this shouldn't work except for maybe that your pool is damaged. Did you ran the scrub to try to fix the corruptions on sdc?
 
Code:
root@PVE-1:~# zpool status
  pool: klustersynced_zfs
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:33:03 with 0 errors on Tue Feb 20 23:54:01 2024
config:


    NAME               STATE     READ WRITE CKSUM
    klustersynced_zfs  ONLINE       0     0     0
      mirror-0         ONLINE       0     0     0
        sdb            ONLINE       0     0     0
        sdc            ONLINE       0     0     3
      mirror-1         ONLINE       0     0     0
        sdd            ONLINE       0     0     0
        sde            ONLINE       0     0     0
      sdg              ONLINE       0     0     0
      mirror-3         ONLINE       0     0     0
        sdh            ONLINE       0     0     0
        sdi            ONLINE       0     0     0
      sdf              ONLINE       0     0     0
      mirror-5         ONLINE       0     0     0
        sdn            ONLINE       0     0     0
        sdm            ONLINE       0     0     0

No changes after a scrub.

Do you think I should migrate my data to another node and completely destroy the pool?
 
Then those 3 corruptions are unfixable. ZFS can only fix those when there is another copy or parity data and they probably happend at a time where the other disk of that mirror already had failed. But usually "zpool status" would tell you what data is affected by those corruption so you know what you have to restore from backup.
 
Last edited:
' zpool clear klustersynced_zfs ' should clear up the 3 CKSUM errors.

> Do you think I should migrate my data to another node and completely destroy the pool?

Yes. You need to rebuild the pool with proper mirrors and no "outlier" disks (sdf and sdg are outside the mirror columns.)
Actually since you have 10 disks (and if you don't need maximum speed + free space) you could consider rebuilding it as a raidz2, which would give you ANY 2-disks failing protection with no data loss.

PROTIP: plan out what you're going to do and put the ' zpool create ' command in a bash script, and TRY IT IN A VM FIRST (or create a file-backed pool) to make sure it will do what you intend.

You can create the pool with short disk names no problem, but then you should immediately export the pool and reimport it with ' zpool import -a -f -d /dev/disk/by-id ' or whatever long-name format suits you. Sometimes a disk will not show up in by-id, but it should always be in by-path.

For convenience, I have a drivemap script that enumerates all disk paths to short device names:
https://github.com/kneutron/ansitest/tree/master

See ' drivemap.sh ' -- and you can put it in /etc/rc.local to run at every boot, + rerun every time there's a disk change.
 
  • Like
Reactions: Dunuin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!