[SOLVED] Boot-SSD ZFS-Raid1 Wearout already 1% after 5 days?

Machtl · Jan 11, 2023

Hi,

i recently built a new Proxmox Server that i am about to ship to the Datacenter. In the past i just used one boot drive with ext4 and thats it.
This time i am using 2x SATA SSDs Samsung PM893 Datacenter 480GB in a ZFS Raid1 Mirror with /dev/sda & /dev/sdc. The specs for the PM893 are TBW 876TB.

There are no VMs running of that storage, no backups, nothing. Just for boot drives and the system.

Today i noticed that the wear out level is already shown with 1% !!??

Whats going on here?

Can someone help me to check if this is normal or if something is terribly wrong?

Here is the current state of one of the discs:

Code:

smartctl --all /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.74-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG MZ7L3480HCHQ-00A07
LU WWN Device Id: 5 002538 f0282a445
Firmware Version: JXTC304Q
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jan 11 19:54:17 2023 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  35) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       150
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       19
177 Wear_Leveling_Count     0x0013   099   099   005    Pre-fail  Always       -       1
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   100   100   010    Pre-fail  Always       -       445
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   072   062   000    Old_age   Always       -       28
194 Temperature_Celsius     0x0022   072   060   000    Old_age   Always       -       28 (Min/Max 23/40)
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0033   100   100   010    Pre-fail  Always       -       0
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       9
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       215023089
242 Total_LBAs_Read         0x0032   099   099   000    Old_age   Always       -       31388177
243 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
244 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
245 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       65535
246 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       65535
247 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       65535
251 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       225454848

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         1         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  256        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:03 with 0 errors on Sun Jan  8 00:24:04 2023
config:

        NAME                                                     STATE     READ WRITE CKSUM
        rpool                                                    ONLINE       0     0     0
          mirror-0                                               ONLINE       0     0     0
            ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T80xxxx-part3  ONLINE       0     0     0
            ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T80xxxx-part3  ONLINE       0     0     0

errors: No known data errors

I have Intel SSDs in my other Proxmox Servers that are now running for over 2 years and still showing 0% wear out level ?

Best regards,
Martin

deepmox · Jan 13, 2023

I am a noob, can you show/expand on where this:
'''
wear out level is already shown with 1%
'''
information is displayed.

Thx!

mr44er · Jan 14, 2023

It's normal for samsung, they will start and stay long time on 1%. I would worry and calculate only if they climb to 2% in short time.

Machtl · Jan 14, 2023

mr44er said:
It's normal for samsung, they will start and stay long time on 1%. I would worry and calculate only if they climb to 2% in short time.

yea thx @mr44er , it looks like that samsung ssds are pretty fast skipping the 0% threshold and showing 1%. i rechecked today and the data written on the zfs boot mirror was 103 GB with the 150hours shown above, and now 134GB with 214hours. so thats a rate of ~ 12GB per day. with the given TBW of 876TB on those drives this would last for over 200years. i think i am good.

will mark it as solved. but i still wonder about 12GB per day.

Machtl · Jan 14, 2023

deepmox said:
I am a noob, can you show/expand on where this:
'''
wear out level is already shown with 1%
'''
information is displayed.

Thx!

You can check in in the Wear_leveling_count value as a percentage or raw value with smartctl,but PVE will also show it in the 'Disks' tab when you click on your PVE in the WebGUI.

pos-sudo · May 17, 2023

mr44er said:
It's normal for samsung, they will start and stay long time on 1%. I would worry and calculate only if they climb to 2% in short time.

I found this thread, and can confirm that we have the PM893's 1.92TB variant in ZFS RAID 1 has pretty fast a wear-out percentage of 1%. Hopefully you're right

Search

Search

[SOLVED] Boot-SSD ZFS-Raid1 Wearout already 1% after 5 days?

Machtl

Active Member

deepmox

New Member

mr44er

Famous Member

Machtl

Active Member

Machtl

Active Member

pos-sudo

Active Member

We value your privacy