[SOLVED] SSD faulty after 3 months?

GazdaJezda · Apr 17, 2021

Hi!

I just notice that zpool status give me:

I see that there is no data errors, but do i need to change disk? How can i determine that? I just started a pool check (zpool scrub), is there anything else i need to do?
Wearout is only on 1% (discs practically idling most of the time):

I'm a little bit surprised, i must admit. After 3 months of very light working already problems? Are this serious (do i need to change disc)? For now all was working flawless, i intend to keep it that way. Server is mainly used for serving video content (cartoons and movies), serving personal webpage and running a Transmission client. Nothing stressfull really.

Thanx for any answer, really!

showiproute · Apr 17, 2021

I had a Samsung 860 QVO and as soon as there are any IO operations on it Proxmox is reporting issues althought it works fine & without any issues.

GazdaJezda · Apr 17, 2021

Issues like that? I just need to know if disc is seriously damaged (acording to write errors) or it is kind of false alert (hopefully). Your disc is standalone or in mirror / raidx configuration?

showiproute · Apr 17, 2021

I used it as a standalone and for me it was a false alert. My SSD still works.

GazdaJezda · Apr 17, 2021

Aha, great for you, but i'm still not sure if that's same for me. Do not understand me wrong but i just don't wannt to have any problems later

Dunuin · Apr 17, 2021

Are there other drives in the server?
On one server I fixed the read/write errors by chaning the SATA cable.
On another server the pool with the SSDs got read/write errors because a HDD on another pool used SMR and that somehow caused problems on the SSD pool (so the HDD caused problems with the SATA controller or something like that).

Did you increased the volblocksize? If you are running zvols of a raidz1 with the default 8K volblocksize you are wasting around 25% of your total capacity due to bad padding. If your pool is using a ashift of 12 you should use a volblocksize of 12K or 64K.

GazdaJezda · Apr 17, 2021

@Dunuin: this server has 6 SSD's:
- Nvme (boot device)
- 4 Sata Evo 860 SSD (connected with original Supemicro cables - get them with motherboard / case), they are in raidx1 pool
- 1 Evo 860 (connected to USB via usb to sata), used for pve VM backups.

It all worked (still works as i see) fine, only today i saw that error in pool status. I will see what will be result of scrubbing. If it will be without data errors, can i assume everything is ok?

P.S. - adding a picture of server, external (bakup) disc is visible on a usb cable. But that disc is not a problem anyway.

GazdaJezda · Apr 17, 2021

Scrub just finished. Same result as on original msg i saw today: No known data errors.
But how can i determine if disc is going to die?

P.S. - according to https://zfsonlinux.org/msg/ZFS-8000-9P/ i will treat this as a minor error. I hope it still will work without problem...

ph0x · Apr 18, 2021

You're probably right on that.
Still could dig a bit more into it. What is the output of smartctl -a /dev/sdX for every disk?

GazdaJezda · Apr 18, 2021

@ph0x : i will provide that tommorow. In the moment i'm unable to do that.

ph0x · Apr 18, 2021

No worries, take your time.

GazdaJezda · Apr 18, 2021

Ok, i attach the output of smartctl:

sda.log - ZFS pool member
sdb.log - ZFS pool member
sdc.log - ZFS pool member
sdd.log - ZFS pool member
sde.log - USB attached

I do not see any trouble here or?

ph0x · Apr 18, 2021

No self tests have been logged, you should consider setting up S.M.A.R.T. self tests through smartmontools (/etc/smartd.conf).

You have one CRC error on /dev/sdc which is probably the reason for zfs complaining. If more of those errors reoccur after clearing the pool you should consider replacing the disk or maybe RMA it, if it's still possible.

GazdaJezda · Apr 19, 2021

@ph0x: Thank you mate. I didn't know that i can schedule a regular disk check.

I just have cleared a zpool error and set this in /etc/smartd.conf:

Code:

/dev/sda -a -d sat
/dev/sdb -a -d sat
/dev/sdc -a -d sat
/dev/sdd -a -d sat

/dev/sda -d scsi -s L/../../1/01 -m root
/dev/sdb -d scsi -s L/../../2/01 -m root
/dev/sdc -d scsi -s L/../../3/01 -m root
/dev/sdd -d scsi -s L/../../4/01 -m root

Test email (#/dev/sdb -m root -M test) work (i get them in my mailbox), so now i just need to wait if there will something new pops out about that disk. I hope it will work, really

Thank you again!

ph0x · Apr 19, 2021

Good luck!

GazdaJezda · Apr 19, 2021

GazdaJezda · Jun 23, 2021

Hi, i'm baaack

After 2 months i need to confirm that everything is now running great (ok, i have some strange notice on console, but it's only notice).
After i set a regular check/scrub and little bit raise a vent speed (better cooling) SSD discs are still as they were in april (on 1% wearout):

I can say that for now they works very reliable. Also, ZFS pools are ok now:

The major difference is SSD temperature, which is now in range from 32 till 34 degrees - oppose to 40 and some in first 4 months, when i had a silent system, which was not too beneficial for SSD's (my guess). I think this is a major difference.

For now i'm very happy with ProxMox! If only mobile app were a little bit more capable it would be perfect!

And "problem" (which so far is not a problem, since it all works ok) are this kernel messages on console:

Dazed and confused, indeed

Best regards to all!

Search

Search

[SOLVED] SSD faulty after 3 months?

GazdaJezda

Member

showiproute

Well-Known Member

GazdaJezda

Member

showiproute

Well-Known Member

GazdaJezda

Member

Dunuin

Distinguished Member

GazdaJezda

Member

Attachments

GazdaJezda

Member

Attachments

ph0x

Renowned Member

GazdaJezda

Member

ph0x

Renowned Member

GazdaJezda

Member

Attachments

ph0x

Renowned Member

GazdaJezda

Member

ph0x

Renowned Member

GazdaJezda

Member

GazdaJezda

Member