Failing drive not reported as failing

antipiot

Well-Known Member
Jan 26, 2019
67
5
48
36
Hello!

One of my drive in a Synology was reported and marked as failing after a self test:
I wanted to see how Proxmox would handle it and this happened:

In GUI - Disk is reported as SMART Passed
1607932428341.png

But the self test report the drive as failed.
1607932485299.png

Here's the full SMART of the drive
1607932695032.png

How does this disk is not marked as failing? what to trust?

Thanks for your informations

Regards - JS
 
hi,

could you post the output of smartctl -a /dev/sde ?
 
you should backup all your important data and replace the drive.

smartctl says PASSED but if you look in the log you will see the errors
 
you should backup all your important data and replace the drive.

smartctl says PASSED but if you look in the log you will see the errors
There is no data on this drive :)
I was interested to know why the SMART status say's PASSED but the synology reported this drive as failing and promox not, wich should be the case if i'm not mistaken.
 
we parse the SMART output from the CLI response, so if the command line tool reports passed it will show up as passed.
what do you get if you run the same command in synology?
however the drive is problematic regardless of the passed status.
 
we parse the SMART output from the CLI response, so if the command line tool reports passed it will show up as passed.
what do you get if you run the same command in synology?
however the drive is problematic regardless of the passed status.
Absolutly. That was my guess but this may be very problematic in some cases.

No idea why but smarctl on synology report drive has no smart support even tho the webui shows thoses informations - no matter the drive but actually succeed in reporting the right status "failing" even if the smart status return PASSED. Wich is very strange to me
 
we parse the SMART output from the CLI response, so if the command line tool reports passed it will show up as passed.
what do you get if you run the same command in synology?
however the drive is problematic regardless of the passed status.
it could be an improvement to check for this kind of errors on top of smartctl exit status with something like:

Code:
if [ "$smarctlstauts" -eq 0 ]; then
    smart_status=$($SMARTCTL -a $DEVICE)
    if echo $smart_status | egrep -wqi 'failure' ; then
    disk_status=failing
 
we parse the SMART output from the CLI response, so if the command line tool reports passed it will show up as passed.
what do you get if you run the same command in synology?
however the drive is problematic regardless of the passed status.
EDIT: found something: @oguz please check this:

smartctl -H /dev/sde return exit code 0
smartctl -a /dev/sde return exit code 192

My guess is that Proxmox use the -H to check the exit code wich actually does not seems to work great and should be replaced with -a.

ORIGINAL POST:
Still digging into this and i think i may have a school case that may need to be checked and eventually corrected:

So, this drive /dev/sde shows as SMART passed but fails at extended self test at 90%

I know how the exit codes are used and how they work using 8 bits.
I use this code to check for bit values provided by the man page of smartctl at manpage

Code:
  #!/bin/sh
  smartctl -a /dev/sde > /dev/null
  varstatus=$?
for ((i=0; i<8; i++)); do
  echo "Bit $i: $((varstatus & 2**i && 1))"
done;


Here's the output for a drive /dev/sdc wich has been a bit hot once and it does match exit code, bit value and smart explanation for this bit.
1608021661695.png1608021680585.png

Correct me if im mistaken but the bit define the exit code so there may be thoses values and they may be combined.
0 1 2 3 4 5 6 7
1 2 4 8 16 32 64 128

same test for the failing drive reports this:
This would match the description of the smartctl exit codes bits wich are:
bit 6: Device error log contains records of error
bit 7: Device self-test log contains records of errors ......
1608042577715.png1608022071919.png
1608022801550.png

What do you think about this? This drive should be reported as failing but is not - How?

Regards - JS
 

Attachments

  • 1608021449708.png
    1608021449708.png
    22.5 KB · Views: 22
Last edited:
There's this code in

pve-storage/PVE/Diskmanage.pm

Code:
my $cmd = [$SMARTCTL, '-H'];
wich should be replaced with
Code:
my $cmd = [$SMARTCTL, '-a'];
if i've understood how it works
 
hi, thank you for looking into this. it would be very helpful if you could submit a bug report at https://bugzilla.proxmox.com and link this discussion here. thanks!
 
  • Like
Reactions: antipiot
It's now April 2024 and I have duplicated the above test. Proxmox is passing failing hard drives. Proxmox is better than Spinrite at repairing failed drives. ;-) Marvelous software indeed!
 
It's now April 2024 and I have duplicated the above test. Proxmox is passing failing hard drives. Proxmox is better than Spinrite at repairing failed drives. ;-) Marvelous software indeed!
Hello!
I think you better have to bump this on Bugzilla -> https://bugzilla.proxmox.com/show_bug.cgi?id=3203

On my side i've developped my own smart monitoring tool so i'm not looking into this anymore.

Regards - JS
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!