[SOLVED] SSD faulty after 3 months?

GazdaJezda

Member
Nov 19, 2020
46
3
13
25
Hi!

I just notice that zpool status give me:

1618681558109.png

I see that there is no data errors, but do i need to change disk? How can i determine that? I just started a pool check (zpool scrub), is there anything else i need to do?
Wearout is only on 1% (discs practically idling most of the time):

1618682112728.png
I'm a little bit surprised, i must admit. After 3 months of very light working already problems? Are this serious (do i need to change disc)? For now all was working flawless, i intend to keep it that way. Server is mainly used for serving video content (cartoons and movies), serving personal webpage and running a Transmission client. Nothing stressfull really.

Thanx for any answer, really!
 
I had a Samsung 860 QVO and as soon as there are any IO operations on it Proxmox is reporting issues althought it works fine & without any issues.
 
Issues like that? I just need to know if disc is seriously damaged (acording to write errors) or it is kind of false alert (hopefully). Your disc is standalone or in mirror / raidx configuration?
 
Aha, great for you, but i'm still not sure if that's same for me. Do not understand me wrong but i just don't wannt to have any problems later :)
 
Are there other drives in the server?
On one server I fixed the read/write errors by chaning the SATA cable.
On another server the pool with the SSDs got read/write errors because a HDD on another pool used SMR and that somehow caused problems on the SSD pool (so the HDD caused problems with the SATA controller or something like that).

Did you increased the volblocksize? If you are running zvols of a raidz1 with the default 8K volblocksize you are wasting around 25% of your total capacity due to bad padding. If your pool is using a ashift of 12 you should use a volblocksize of 12K or 64K.
 
@Dunuin: this server has 6 SSD's:
- Nvme (boot device)
- 4 Sata Evo 860 SSD (connected with original Supemicro cables - get them with motherboard / case), they are in raidx1 pool
- 1 Evo 860 (connected to USB via usb to sata), used for pve VM backups.

It all worked (still works as i see) fine, only today i saw that error in pool status. I will see what will be result of scrubbing. If it will be without data errors, can i assume everything is ok?

P.S. - adding a picture of server, external (bakup) disc is visible on a usb cable. But that disc is not a problem anyway.
 

Attachments

  • IMG_20210417_214238-picsay.jpg
    IMG_20210417_214238-picsay.jpg
    97.4 KB · Views: 31
Last edited:
Scrub just finished. Same result as on original msg i saw today: No known data errors.
But how can i determine if disc is going to die?

P.S. - according to https://zfsonlinux.org/msg/ZFS-8000-9P/ i will treat this as a minor error. I hope it still will work without problem...
 

Attachments

  • Screenshot_20210417-215822.png
    Screenshot_20210417-215822.png
    322.9 KB · Views: 18
Last edited:
You're probably right on that.
Still could dig a bit more into it. What is the output of smartctl -a /dev/sdX for every disk?
 
Last edited:
Ok, i attach the output of smartctl:
  • sda.log - ZFS pool member
  • sdb.log - ZFS pool member
  • sdc.log - ZFS pool member
  • sdd.log - ZFS pool member
  • sde.log - USB attached
I do not see any trouble here or?
 

Attachments

  • sde.log
    4.3 KB · Views: 4
  • sdd.log
    4.2 KB · Views: 2
  • sdc.log
    4.2 KB · Views: 5
  • sdb.log
    4.2 KB · Views: 2
  • sda.log
    4.2 KB · Views: 6
No self tests have been logged, you should consider setting up S.M.A.R.T. self tests through smartmontools (/etc/smartd.conf).

You have one CRC error on /dev/sdc which is probably the reason for zfs complaining. If more of those errors reoccur after clearing the pool you should consider replacing the disk or maybe RMA it, if it's still possible.
 
  • Like
Reactions: GazdaJezda
@ph0x: Thank you mate. I didn't know that i can schedule a regular disk check.

I just have cleared a zpool error and set this in /etc/smartd.conf:

Code:
/dev/sda -a -d sat
/dev/sdb -a -d sat
/dev/sdc -a -d sat
/dev/sdd -a -d sat

/dev/sda -d scsi -s L/../../1/01 -m root
/dev/sdb -d scsi -s L/../../2/01 -m root
/dev/sdc -d scsi -s L/../../3/01 -m root
/dev/sdd -d scsi -s L/../../4/01 -m root

Test email (#/dev/sdb -m root -M test) work (i get them in my mailbox), so now i just need to wait if there will something new pops out about that disk. I hope it will work, really :)
Thank you again!
 
Hi, i'm baaack :)

After 2 months i need to confirm that everything is now running great (ok, i have some strange notice on console, but it's only notice).
After i set a regular check/scrub and little bit raise a vent speed (better cooling) SSD discs are still as they were in april (on 1% wearout):

1624442988773.png

I can say that for now they works very reliable. Also, ZFS pools are ok now:

1624443025953.png

The major difference is SSD temperature, which is now in range from 32 till 34 degrees - oppose to 40 and some in first 4 months, when i had a silent system, which was not too beneficial for SSD's (my guess). I think this is a major difference.

For now i'm very happy with ProxMox! If only mobile app were a little bit more capable it would be perfect! :)

And "problem" (which so far is not a problem, since it all works ok) are this kernel messages on console:

1624443527778.png

Dazed and confused, indeed ;)

Best regards to all!
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!