Disk health

Immortelle

New Member
May 20, 2024
1
0
1
extremepornsite.com
Hello!
Yesterday I did my routine check over the disk tab and both of my nvme's are declared dead:
16.png92.png
Are there other options to get a proofing fact that it is about to die? Right now both nvmes do not have write/read errors and full spare threshold counter as seen in the screenshots. I also don't want to hesitate buying new ones because the replacement would be done by resilvering the mirror just one nvme after another with no pain.
 
Hi,

what are the models of your disks? The manufacturer always gives some figure for TBW (Total Bytes Written), which is basically the endurance of a drive w.r.t to writes. As you have 413/414 TB written, it might be good to compare that figure with the limit given by the manufacturer.

Also, was the SMART data report reliable until now, i.e. did it report sensible values? Just to get a feeling they might just report bogus data.
 
Looks like they have processes 11% more writes than they were designed for, but they look fine as they still have all of the spare memory cells. I would replace only one of them as they seem to last longer than expected and not about to die at all. That said, I have had drives die without any (SMART) warning at all. By replacing only one, at least you don't have two drives that are at the same age and wear level.
 
Geez man, you have 413TB written in a little over a year on one, and 414TB in JUST over a year on the other?

You're going to need Enterprise-class SSDs if you don't want to keep replacing them that frequently. And probably larger sizes, leaving some free space at the end of the partition table for extra cells. You want something with a high TBW rating and at least a 5-year warranty, or budget for replacements + downtime -- and have very good backups.

I would also replace both ASAP - once a system tells you a disk is dying, it's not worth waiting around for the actual fail. Now is the time for full backups, and active monitoring of system health.

Also look into turning off atime everywhere (including in VM guests!), disabling cluster services if this is single-node, zram and log2ram - and maybe folder2ram
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!