Wear levels on SSD's increasing daily.

300cpilot

Well-Known Member
Mar 24, 2019
115
6
58
Anyone else seeing this amount of wear on ssd's?

I have been running this node for about 2 years, with the drives that are in it. I was watching to see if they would withstand being used this way, because they are desktop drives, so I expected limits, but something has changed. Any way about a year ago we noted the disk health and they were not showing anything. I had not looked at it again until yesterday and the boot drive is now at 37% wearout and the other three drives were at 7% wearout. Today however one drive has increased to 9%. A 2% increase over night.

The os is on the latest patches/updates VE 8.3.2.

1735839821342.png
 
You should probably proactively replace the 240GB before it gets to ~80%, but what do you expect with consumer-level SSD? Proxmox docs explicitly say they are not recommended. Even for homelab use it's chancy. Plus you have all 3 the same make/model, so likely they're all wearing out around the same time.

If you are doing zfs-on-zfs in-guest, you probably have crazy write amplification.

You don't give any details on how your VMs and backing storage are configured, so it's basically impossible to do anything but theorize here. For all I know, you have your VMs hosted on a 3-disk raidz1, and that's definitely not ideal.

https://howtoaskfor.help/
 
  • Like
Reactions: Johannes S and UdoB
I think I was not clear, I am looking to see if anyone else noticed anything similar. If they have then its a warning, if not my drives did not last as long as I thought they would, my fault. Three years isn't bad, they were moved from a different server 2 years ago. I thought I was clear in the description that I knew the drives were not recommended, sorry. New drives have been ordered and should be here in a couple of days.

This is a small setup, single Dell 610 server 192 gigs ram, 12 cores, hba adapter, single drive for os (bought 6/2020), 3-2tb drives(bought 6/2022) in a single ZFS raidz1-0. I use PBS to backup on separate ma, has 2 CT's and 5 VM's on a separate machine. The screen shot shows how the drives are configured.

What is running on it.
A postfix DB server that records environmental variables from a temp, humid, Radon, Co2 and Co sensors, every 10 minutes.
Second postfix server, it is only used when I am developing, which has not been for about 6 months.
A git server, which is rarely written to.
Home assistant VM
Webserver for local Graphana server, displays environmental data on a remote raspberry pi. It will read the data from DB once an hour.
Opnsense firewall, runs a single user's office
PiHole DNS
 
I think I was not clear, I am looking to see if anyone else noticed anything similar.
No problem and yes, this is normal. I saw on fresh Samsung 0% changing over night to 1%. The change from 1% to 2% took longer.
The wearout-% are not live, so a 2% jump is not unusual and this comes from SMART and it is a prediction, not an accurate state to trust fully.

Three years isn't bad
Yes, indeed. You are lucky with it.
 
  • Like
Reactions: Kingneutron
Yes, I see the same thing in my home lab on consumer SSDs when using Ceph/ZFS. Those are busy file systems (CoW) compared to EXT4/XFS. I have a few SSDs running non-important VMs 5+ years old at 90%+ and they're still going. I plan on running them into the ground to see how far they can go. I'm getting 'spurious read errors' in Ceph lately, so maybe that's an early warning sign?
 
I'm getting 'spurious read errors' in Ceph lately, so maybe that's an early warning sign?
Yes, either the disk gives wrong data back (not that data that was written onto it) or it takes too much time for error correction the disk itself has running. (retry re-read until it fits or forever or give up after ~10-20th attempt, depends on firmware)
 
Last edited:
  • Like
Reactions: Kingneutron
I am using a https://www.ebay.com/itm/163142320696

Dell H200 Integrated 6Gbps SAS HBA w/ LSI 9210-8i P20 IT Mode ZFS FreeNAS unRAID​

The guy has quite a business reflashing raid controllers, plus a youtube channel. This was a simple raid card changed to a HBA. Been really happy with it. However I am pretty certain it does not have trim support, its pretty old.

I tried cloning the bad drive, but the new drive will not boot. Can't remember how to make the partition bootable after the clone, google has been lacking.
 
Personally I only use SSDs for OS drives. For live data HDDs are still the best, long term. I get acceptable performance with raid 10 and most of my drives get 10+ years of life. They still work after that, it's just that I tend to replace them anyway as part of upgrades and they get repurposed for backups and other misc uses. Just replaced a bunch of 1TB and 2TB drives with 10TB drives about a year ago that were at around 9 years of continuous run time.

I'd only use SSDs for VMs that are performance critical, and accept having to replace them more often. Still want to use some form of raid though, so that you can replace them one by one without interruption.
 
Sorry for the delay, been traveling. I have decided to backup the vm's on the backup server. Then replace the drives and reload from scratch. Once it is updated to the same version, I will import the vm's back. Thanks everyone!

Side note the boot drive smart data says it has been powered on for 41595 hours. The newer Crucial drives only 13377.
 
  • Like
Reactions: exter01