smartctl fails to get SMART values

ozdjh

Well-Known Member
Oct 8, 2019
115
26
48
Hi
I've just run up PBS 2.2-1 in our lab to test it out. Installation was onto an existing Debian 11 server. Everything is going pretty well except one issue I'm seeing is that it can't run smartctl on the drives. Trying to view the SMART values for a disk shows the attached (failed - status code: 4 - no error message). I can run the command it's indicating from the shell and it works fine. I tried adding a symlink in /bin to the binary in /usr/sbin in case it was a path issue but that hasn't helped. Any ideas?
 

Attachments

  • Screen Shot 2022-06-08 at 1.58.09 pm.png
    Screen Shot 2022-06-08 at 1.58.09 pm.png
    27.4 KB · Views: 44
hi,

can you post the output of
Code:
smartctl -H -A -j /dev/sda
echo $?
?
 
Hi Dominik

As a test I just wrapped smartctl with a shell script that runs smartctl and exits with 0. I'm getting all the expected data in the PBS Web UI now. Clearly not a good solution but it appears the exit status doesn't impact on gathering the data.
 
It produces 600 lines of JSON formatted SMART information and an exit status of 4. Do you want to see the JSON output?
yes i'd like to see the output so that i can see what might produce that error so that we can fix it

As a test I just wrapped smartctl with a shell script that runs smartctl and exits with 0. I'm getting all the expected data in the PBS Web UI now. Clearly not a good solution but it appears the exit status doesn't impact on gathering the data.
might be an ok workaround, but to really fix it, i'd need to see the json output
 
thanks, sadly the warning is not really helpful ;)
can you also post the output of
Code:
smartctl -a /dev/sda

? maybe there is more info there...
 
ok it says:
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
so basically there is some data missing from the ATA response, and it evaluated only the attributes, which may be unreliable.

do you have the disk behind a hba/raid controller/usb bridge ?
anything that could mangle the ata protocol in between? (broken sata cable, backplane, etc)?
also drive firmware could be an issue maybe?

in any case, there is something wrong, here, what exactly i cannot say...
(dmesg output may indicate something)
 
There are no errors in dmesg. The lab kit is from our past generation production virtualisation (OnApp) cluster and ran with those SSDs and those controllers. The gear is IBM System X and the standard controller is RAID capable but running those drives as JBOD.

There are 3 other boxes of the exact same spec that run PVE 7.1 as our lab. I've just checked and they are all reporting the same output from smartctl so it's not a cable or something. The controller *may* be mangling the ATA response but like I said, the gear was in production for several years and it's never skipped a beat.
 
mhmm ok, then i'd say this is some 'hardware quirk', and your workarounds seems sufficient. i don't want to remove the check of the exit code though, since normally that indicates some underlying error, even if some attributes may be readable.
 
  • Like
Reactions: Neobin
Hi Dominik

As a test I just wrapped smartctl with a shell script that runs smartctl and exits with 0. I'm getting all the expected data in the PBS Web UI now. Clearly not a good solution but it appears the exit status doesn't impact on gathering the data.
Hi,
I'm facing the same problem when smartctl fails to get SMART values, but using an external usb raid bay on Raspberry Pi. To get the values I also have to add '-d sat' to smartctl parameter. Could you, please, share your solution?

Screenshot-20220929062330-1020x735.png
 
It's not an ideal solution, but all I did was :

Code:
# mv /usr/sbin/smartctl /usr/sbin/smartctl.orig

and then I forced it to return a 0 exit status by putting the following in /usr/sbin/smartctl

Code:
#!/bin/sh

/usr/sbin/smartctl.orig $*
exit 0
 
@dcsapak I can reproduce this behaviour on an super micro storage server without using any usb adapter or similiar hardware with 3 of 4 of the attached ssds on the hardware. They are directly connected to the onboard SATA ports.
The problematic ssds are samsung sm883. It works with an samsung evo 850.

I attached the smartctl output from one of the problematic drives.
 

Attachments

  • smartctl_output.txt
    14.5 KB · Views: 6
mhm.. yeah we'll probably have to do some changes regarding this
in pve we simply ignore bit 3, but we actually want to check the response to see what exactly triggered bit 3... (for both pve/pbs)

would you mind opening a bug report: https://bugzilla.proxmox.com
 
any update would be appreciated. this is getting critical for alot of people in the same situation since 2020-2021

and this issue is present in PVE and PBS
 
@DC-CA1 if you look into the bug report where was a quick fix which is already available in the repositories. I Installed it and it solved the problem. There are still plans to completly rework the smat system in Proxmox VE and Backups Server.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!