Intel 530 SSD - problem with reclaiming space

blackpaw

Renowned Member
Nov 1, 2013
295
20
83
I have two Intel 530's in two seperate nodes that are used as journals for three ceph osd's each (three journal portions on the ssd). The SSD's have been in use for 18 months, for the past two months I've also been using them as a slog/cache device for a ZFS pool.

Partition layout is as follows:
ceph journal 1: 10GB
ceph journal 2: 10GB
ceph journal 3: 10GB
zfs slog : 1GB
zfs cache: 10GB
Free: 79GB



Last night I got a smartctl warning: "Device: /dev/sdb [SAT], Failed SMART usage Attribute: 170 Available_Reservd_Space." on one node.

smartctl -a shows:


ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 2
9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 3177h+17m+14.350s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11

170 Available_Reservd_Space 0x0033 010 010 010 Pre-fail Always FAILING_NOW 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 1
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 5
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 8
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 033 042 000 Old_age Always - 33 (Min/Max 25/42)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 5
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1498365
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 65535
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 1
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 65535
232 Available_Reservd_Space 0x0033 010 010 010 Pre-fail Always FAILING_NOW 0
233 Media_Wearout_Indicator 0x0032 064 064 000 Old_age Always - 0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1498365
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 16144
249 NAND_Writes_1GiB 0x0032 100 100 000 Old_age Always - 195427



The other node has Available_Reservd_Space at 14%

Given the SSD has 79GB out 120GB free I find this weird in the extreme. All the partions are being used direct - no filesystem so I can't run fstrim. What I did do was:
  • stop ceph
  • flush the journals
  • blkdiscard on the raw device (/dev/sdb), which erased the partition table as well.
  • recreate the partions
  • recreate the ceph jopurnals
  • run a short and long smartctl test

Unfortunately Available_Reservd_Space is unchanged. This doesn't make sense to me - with 65% free space I thought I should have a lot longer than a 18month lifespan out of the two SSD's.

Am I missing something? short of a "Hail Mary" I'll be replacing both SSD's on Monday.
 
ZFS does not support trim yet, so it'll destroy your SSD eventually.

241 shows, that you wrote A LOT to the disk (100 times more than read):

32 MB * 1498365 Blocks / 1024 / 1024 = 45,726470947 TB

Maybe you're at the end of life for the 530er. How many DWPD are supported? You're at a little less than 1 DWPD if my math is correct.
 
Thanks LnxBill

I removed it from the ZFS pool, erased all the aprtions and ran blkdiscard on it, so that should have released any free blocks.

45 TB is a lot, through pretty normal for a ceph journal and its my understanding these devices in get to the petabyte range before failing in reality.

The official TBW is 34TB.

I don't understand how the Available_Reservd_Space can be so low while every other indicator si fine - no write errors etc.

Neverless, i guess I better get new drives ...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!