Disk smart status no longer working

joarc

New Member
Jan 20, 2021
6
1
3
25
Hi.

So a few months ago, we updated to proxmox 6.2-10 (from 6.1 i think) and before we upgraded, we had working smart and disk statuses. But since upgrading, the disks except for the boot drives, have stopped showing smart PASSED and Wearout, and disk types are unknown. Now, we have had to prioritize other things, but we are about to add a few more nodes to this cluster and want to resolve this so we can monitor them correctly.

This is currently a 6-node cluster with 3 nodes that are older Supermicro ones (with LSI SAS2008 based cards in HBA mode working fine and displaying smart status correctly) and 3 HP DL380p Gen8 with HP P420i cards in HBA mode. The three HP machines have each 25x 900GB drives.

The drives can be queried with smart using the following command: smartctl -a -d cciss,3 /dev/sdX
Did something change between these versions that affected how proxmox smart-checks disks?
 

Attachments

  • proxmox-disk-status-issue.jpg
    proxmox-disk-status-issue.jpg
    814.6 KB · Views: 124
Last edited:
hi,

could you please post the output from your command smartctl -a -d cciss,3 /dev/sdX ?

So a few months ago, we updated to proxmox 6.2-10 (from 6.1 i think)
what is your current pveversion -v output?
 
Smartctl:
Code:
root@node-04:~# smartctl -a -d cciss,3 /dev/sdd
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.44-2-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               NETAPP
Product:              X423_HCOBE900A10
Revision:             NA01
Compliance:           SPC-4
User Capacity:        900,185,481,216 bytes [900 GB]
Logical block size:   512 bytes
Rotation Rate:        10020 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000cca016d4a76c
Serial number:        KPKSYJ2N
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Jan 20 11:50:04 2021 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     31 C
Drive Trip Temperature:        85 C

Manufactured in week 35 of year 2013
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  57
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2187
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   239425        17         1   12804860     252286.190           0
write:         0  6792772         2         0    1749845     240213.149           0
verify:        0    10657         0         0    5700654      45585.749           0

Non-medium error count:     2025

No Self-tests have been logged

root@node-04:~#

pveversion:
Code:
root@node-04:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-10-pve: 4.15.18-32
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
ksmtuned: 4.20150325+b1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-1
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
root@node-04:~#
 
  • Like
Reactions: esquizo
I have the same problem with the same hardware (HP DL360p Gen8 with P420i RAID Controller).

Can someone help please?

Thanks
 
Code:
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   239425        17         1   12804860     252286.190           0
write:         0  6792772         2         0    1749845     240213.149           0
verify:        0    10657         0         0    5700654      45585.749           0

Non-medium error count:     2025

No Self-tests have been logged

it seems there are some errors on the disk?

there was a recent report about a similar problem [0]

so i think it might be related. what do you get if you run the smartctl command like: smartctl -H /dev/sdd ?

[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=3203
 
When using just smartctl -H /dev/sdd (or any disk for that matter) it doesn't work, it wants the -d option.
Code:
root@node-04:~# smartctl -H /dev/sdd
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.44-2-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sdd: requires option '-d cciss,N'
Please specify device type with the -d option.

Use smartctl -h to get a usage summary
root@node-04:~#

but when i add the -d cciss,3 it works and prints this:
Code:
root@node-04:~# smartctl -H /dev/sdd -d cciss,3
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.44-2-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

root@node-04:~#

We don't see these disks as broken or anything, they just appear as unknown and doesn't report smart-values in the gui.
 
For the record, if it were failing drives, that would mean that by upgrading to a newer proxmox version, 75 drives are now "failing" but reports as unknown status, and has been working perfectly fine for several months since the upgrade.
 
what do you get from lsblk? is it possibly a problem with the raid controller?

could you check the journal and syslog? there might be some clues as to what's going wrong.
 
Again, it stopped working on all three nodes as we updated to proxmox 6.2, so either something changed with proxmox or a package proxmox used, and as it works when using smartctl commands directly, i feel like something with how proxmox detects/checks the disks might be weird?

All drives are visible and working, it's only the smart checks in proxmox gui that doesn't work properly. lsblk and lsblk --scsi is below
Code:
root@node-04:~# lsblk                                                                                                                                                                                                 [141/1345]NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 838.4G  0 disk
├─sda1        8:1    0 833.4G  0 part /var/lib/ceph/osd/ceph-30
└─sda2        8:2    0     5G  0 part
sdb           8:16   0 838.4G  0 disk
├─sdb1        8:17   0 833.4G  0 part /var/lib/ceph/osd/ceph-33
└─sdb2        8:18   0     5G  0 part
sdc           8:32   0 838.4G  0 disk
├─sdc1        8:33   0 833.4G  0 part /var/lib/ceph/osd/ceph-36
└─sdc2        8:34   0     5G  0 part
sdd           8:48   0 838.4G  0 disk
├─sdd1        8:49   0 833.4G  0 part /var/lib/ceph/osd/ceph-39
└─sdd2        8:50   0     5G  0 part
sde           8:64   0 838.4G  0 disk
├─sde1        8:65   0 833.4G  0 part /var/lib/ceph/osd/ceph-18
└─sde2        8:66   0     5G  0 part
sdf           8:80   0 838.4G  0 disk
├─sdf1        8:81   0 833.4G  0 part /var/lib/ceph/osd/ceph-21
└─sdf2        8:82   0     5G  0 part
sdg           8:96   0 838.4G  0 disk
├─sdg1        8:97   0 833.4G  0 part /var/lib/ceph/osd/ceph-24
└─sdg2        8:98   0     5G  0 part
sdh           8:112  0 838.4G  0 disk
├─sdh1        8:113  0 833.4G  0 part /var/lib/ceph/osd/ceph-27
└─sdh2        8:114  0     5G  0 part
sdi           8:128  0 838.4G  0 disk
├─sdi1        8:129  0 833.4G  0 part /var/lib/ceph/osd/ceph-41
└─sdi2        8:130  0     5G  0 part
sdj           8:144  0 838.4G  0 disk
├─sdj1        8:145  0 833.4G  0 part /var/lib/ceph/osd/ceph-52
└─sdj2        8:146  0     5G  0 part
sdk           8:160  0 838.4G  0 disk
├─sdk1        8:161  0 833.4G  0 part /var/lib/ceph/osd/ceph-44
└─sdk2        8:162  0     5G  0 part
sdl           8:176  0 838.4G  0 disk
├─sdl1        8:177  0 833.4G  0 part /var/lib/ceph/osd/ceph-45
└─sdl2        8:178  0     5G  0 part
sdm           8:192  0 838.4G  0 disk
├─sdm1        8:193  0 833.4G  0 part /var/lib/ceph/osd/ceph-46
└─sdm2        8:194  0     5G  0 part
sdn           8:208  0 838.4G  0 disk
├─sdn1        8:209  0 833.4G  0 part /var/lib/ceph/osd/ceph-47
└─sdn2        8:210  0     5G  0 part
sdo           8:224  0 838.4G  0 disk
├─sdo1        8:225  0 833.4G  0 part /var/lib/ceph/osd/ceph-51
└─sdo2        8:226  0     5G  0 part
sdp           8:240  0 838.4G  0 disk
├─sdp1        8:241  0 833.4G  0 part /var/lib/ceph/osd/ceph-53
└─sdp2        8:242  0     5G  0 part
sdq          65:0    0 838.4G  0 disk
├─sdq1       65:1    0 833.4G  0 part /var/lib/ceph/osd/ceph-54
└─sdq2       65:2    0     5G  0 part
sdr          65:16   0 838.4G  0 disk
├─sdr1       65:17   0 833.4G  0 part /var/lib/ceph/osd/ceph-58
└─sdr2       65:18   0     5G  0 part
sds          65:32   0 838.4G  0 disk
├─sds1       65:33   0 833.4G  0 part /var/lib/ceph/osd/ceph-62
└─sds2       65:34   0     5G  0 part
sdt          65:48   0 838.4G  0 disk
├─sdt1       65:49   0 833.4G  0 part /var/lib/ceph/osd/ceph-65
└─sdt2       65:50   0     5G  0 part
sdu          65:64   0 838.4G  0 disk
├─sdu1       65:65   0 833.4G  0 part /var/lib/ceph/osd/ceph-68
└─sdu2       65:66   0     5G  0 part
sdv          65:80   0 838.4G  0 disk
├─sdv1       65:81   0 833.4G  0 part /var/lib/ceph/osd/ceph-70
└─sdv2       65:82   0     5G  0 part
sdw          65:96   0 838.4G  0 disk
├─sdw1       65:97   0 833.4G  0 part /var/lib/ceph/osd/ceph-75
└─sdw2       65:98   0     5G  0 part
sdx          65:112  0 838.4G  0 disk
├─sdx1       65:113  0 833.4G  0 part /var/lib/ceph/osd/ceph-78
└─sdx2       65:114  0     5G  0 part
sdy          65:128  0 838.4G  0 disk
├─sdy1       65:129  0 833.4G  0 part /var/lib/ceph/osd/ceph-82
└─sdy2       65:130  0     5G  0 part
sdz          65:144  0  59.5G  0 disk
└─sdz1       65:145  0   3.7G  0 part /boot
nvme0n1     259:0    0 238.5G  0 disk
├─nvme0n1p1 259:1    0 186.3G  0 part /
├─nvme0n1p2 259:2    0     1K  0 part
└─nvme0n1p5 259:3    0  18.6G  0 part [SWAP]


root@node-04:~# lsblk --scsi
NAME HCTL       TYPE VENDOR   MODEL             REV TRAN
sda  2:0:1:0    disk NETAPP   X423_HCOBE900A10 NA01 sas
sdb  2:0:2:0    disk NETAPP   X423_HCOBE900A10 NA01 sas
sdc  2:0:3:0    disk NETAPP   X423_HCOBE900A10 NA01 sas
sdd  2:0:4:0    disk NETAPP   X423_HCOBE900A10 NA01 sas
sde  2:0:5:0    disk NETAPP   X423_HCOBE900A10 NA01 sas
sdf  2:0:6:0    disk NETAPP   X423_HCOBE900A10 NA01 sas
sdg  2:0:7:0    disk NETAPP   X423_SLTNG900A10 NA02 sas
sdh  2:0:8:0    disk NETAPP   X423_HCOBE900A10 NA01 sas
sdi  2:0:9:0    disk NETAPP   X423_SLTNG900A10 NA02 sas
sdj  2:0:10:0   disk IBM-SSG  S0VL900          E039 sas
sdk  2:0:11:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdl  2:0:12:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdm  2:0:13:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdn  2:0:14:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdo  2:0:15:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdp  2:0:16:0   disk NETAPP   X423_HCOBE900A10 NA01 sas
sdq  2:0:17:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdr  2:0:18:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sds  2:0:19:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdt  2:0:20:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdu  2:0:21:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdv  2:0:22:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdw  2:0:23:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdx  2:0:24:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdy  2:0:25:0   disk NETAPP   X423_SLTNG900A10 NA02 sas
sdz  3:0:0:0    disk HP iLO   Internal_SD-CARD 2.10 usb

I have also included lspci for the HBA controller:
Code:
root@node-04:~# lspci -v -s 02:00.0                                                                                                                                                                                             02:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Gen8 Controllers (rev 01)
        Subsystem: Hewlett-Packard Company P420i
        Flags: bus master, fast devsel, latency 0, IRQ 58, NUMA node 0
        Memory at f7c00000 (64-bit, non-prefetchable) [size=1M]
        Memory at f7bf0000 (64-bit, non-prefetchable) [size=1K]
        I/O ports at 5000 [size=256]
        [virtual] Expansion ROM at f7b00000 [disabled] [size=512K]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
        Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [c0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [300] #19
        Kernel driver in use: hpsa
        Kernel modules: hpsa

I have dug thru the logs as well as i could to find anything related to the HBA or drives, but i couldn't find anything other then the below lines that could be useful:
Code:
[    4.033528] hpsa 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM control
[    4.034683] hpsa 0000:02:00.0: Logical aborts not supported
[    4.034685] hpsa 0000:02:00.0: HP SSD Smart Path aborts not supported
[    4.122426] scsi host2: hpsa
[    4.123068] hpsa can't handle SMP requests
[    4.161470] hpsa 0000:02:00.0: scsi 2:0:0:0: added RAID              HP       P420i            controller SSDSmartPathCap- En- Exp=1
[    4.161476] hpsa 0000:02:00.0: scsi 2:0:1:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161480] hpsa 0000:02:00.0: scsi 2:0:2:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161484] hpsa 0000:02:00.0: scsi 2:0:3:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161487] hpsa 0000:02:00.0: scsi 2:0:4:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161491] hpsa 0000:02:00.0: scsi 2:0:5:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161494] hpsa 0000:02:00.0: scsi 2:0:6:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161498] hpsa 0000:02:00.0: scsi 2:0:7:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161502] hpsa 0000:02:00.0: scsi 2:0:8:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161505] hpsa 0000:02:00.0: scsi 2:0:9:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161509] hpsa 0000:02:00.0: scsi 2:0:10:0: added Direct-Access     IBM-SSG  S0VL900          PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161513] hpsa 0000:02:00.0: scsi 2:0:11:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161516] hpsa 0000:02:00.0: scsi 2:0:12:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161520] hpsa 0000:02:00.0: scsi 2:0:13:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161523] hpsa 0000:02:00.0: scsi 2:0:14:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161527] hpsa 0000:02:00.0: scsi 2:0:15:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161531] hpsa 0000:02:00.0: scsi 2:0:16:0: added Direct-Access     NETAPP   X423_HCOBE900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161535] hpsa 0000:02:00.0: scsi 2:0:17:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161539] hpsa 0000:02:00.0: scsi 2:0:18:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161543] hpsa 0000:02:00.0: scsi 2:0:19:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161547] hpsa 0000:02:00.0: scsi 2:0:20:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161551] hpsa 0000:02:00.0: scsi 2:0:21:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161554] hpsa 0000:02:00.0: scsi 2:0:22:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161558] hpsa 0000:02:00.0: scsi 2:0:23:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161562] hpsa 0000:02:00.0: scsi 2:0:24:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161566] hpsa 0000:02:00.0: scsi 2:0:25:0: added Direct-Access     NETAPP   X423_SLTNG900A10 PHYS DRV SSDSmartPathCap- En- Exp=1
[    4.161570] hpsa 0000:02:00.0: scsi 2:0:26:0: masked Enclosure         HP       Gen8 ServBP 25+2 enclosure SSDSmartPathCap- En- Exp=0
[    4.161574] hpsa 0000:02:00.0: scsi 2:0:27:0: masked Enclosure         HP       Gen8 ServBP 25+2 enclosure SSDSmartPathCap- En- Exp=0
[    4.161578] hpsa 0000:02:00.0: scsi 2:0:28:0: masked Enclosure         PMCSIERA SRCv8x6G         enclosure SSDSmartPathCap- En- Exp=0
[    4.161721] hpsa can't handle SMP requests
 
Not to bump an old thread, but just wondering if theres any plan to fix this, or if there is a solution?
 
Just to chime in we have the same issue with all of our HP G9 servers using hba 240ar controllers.

not sure if its the same issue as its a fresh Proxmox install of 6.3 upgraded to 6.4.

We also have a Dell we are about to test as well, something tells me its an HP specific issue.

ill update once we get the Dell online and start testing.

here is the command line output from smart call: (showing as unknown in GUI)

Bash:
# smartctl -a -d cciss,3 /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.106-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Intel S4510/S4610/S4500/S4600 Series SSDs
Device Model:     INTEL SSDSC2KB038T8
Serial Number:    PHYF930205Y43P8EGN
LU WWN Device Id: 5 5cd2e4 151018e23
Firmware Version: XCV10110
User Capacity:    3,840,755,982,336 bytes [3.84 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May  4 20:06:23 2021 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x79) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       9470
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       34
170 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       34
175 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       2367 (34 65535)
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error_Count  0x0033   100   100   090    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Drive_Temperature       0x0022   081   081   000    Old_age   Always       -       19 (Min/Max 18/21)
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       34
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       19
197 Pending_Sector_Count    0x0012   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       0
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       10
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       100
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       567887
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
234 Thermal_Throttle_Status 0x0032   100   100   000    Old_age   Always       -       0/0
235 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       2367 (34 65535)
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       0
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       3
243 NAND_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       75282

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

""Cheers
G
 
Last edited:
Quick update ive checked the Dell R630 and can confirm that the drive SMART status is reporting correctly in the GUI.

let me know how else i can help assist with this issue?

Wondering if the HP HBA Card has some specific driver requirements?

just throwing it out there.

""Cheers
G
 
Any further updates on this?

All my drives all report UNKNOWN as well in the GUI, but from command line, all report fine with smartctl.
For Example:
Bash:
root@pve-1:~# smartctl -H -d scsi /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.22-4-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

I'm also running HP DL360P G8, with 420i Raid Controller in HBA mode.
3 servers, all the same problem.

Thanks.
 
I'm just another guy asking if there is any update on this issue. I have a DL80 gen9 and HBA H240 and all drives report as unknown in the gui but systemctl reports smart data as expected. When using the built in b140i smart array controller smart data is reported in the GUI as expected.

I'll help get this resolved any way I can.
 
Same for me. DL380 Gen6 and Gen7. On One Gen7 ProxMox 7.1-8
Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              EG0300FAWHV
Revision:             HPDF
Compliance:           SPC-3
User Capacity:        300,000,000,000 bytes [300 GB]
Logical block size:   512 bytes
Rotation Rate:        10000 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c5002328cce3
Serial number:        3SE1XXM000009040SS6X
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Thu Jan 13 08:21:44 2022 CET
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     37 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 39506:57
Elements in grown defect list: 1

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0  2984919706          0     137852.228           0
write:         0        0         0         2          0      10638.192           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   19952                 - [-   -    -]

Long (extended) Self-test duration: 3070 seconds [51.2 minutes]
Output of smartctl -H -d scsi /dev/sdg:
Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
 
Last edited:
Same here HP DL360 Gen 9 P840ar
Code:
# smartctl -H -d scsi /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

...

# smartctl -H -d cciss,3 /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
 
I have the same issue with my DL80 G9 and H240 card. I got the info to report by using someone's (Udo Rader) smartctl wrapper. Honestly, I only understand about half of how it works, but it basically just replaces smartctl and you can get at least one HBA working. For some reason it only works with my first H240, the other one still shows unknown. If someone is smarter or has more time/motivation to modify this to make it work better for us, please do and repost here! I got it deep in some forum post. Hope it helps someone! Obviously credit goes to Udo Rader, the instructions to implement are in there.

Code:
#!/bin/bash
 
# -----------------------------------------------------------------
# Copyright (c) 2021 BestSolution.at EDV Systemhaus GmbH
# All Rights Reserved.
#
# This software is released under the terms of the
#
#            "GNU Affero General Public License"
#
# and may only be distributed and used under the terms of the
# mentioned license. You should have received a copy of the license
# along with this software product, if not you can download it from
# https://www.gnu.org/licenses/agpl-3.0.en.html
#
# Author: udo.rader@bestsolution.at
# -----------------------------------------------------------------
#
#  smartctl_cciss.sh
#
#  Wrapper that converts
#
#    '$ smartctl [...] /dev/sda' to '$ smartctl -d cciss,0 [...] /dev/sda'
#    '$ smartctl [...] /dev/sdb' to '$ smartctl -d cciss,1 [...] /dev/sdb'
#    '$ smartctl [...] /dev/sdc' to '$ smartctl -d cciss,2 [...] /dev/sdc'
#    ...
#    '$ smartctl [...] /dev/sdp' to '$ smartctl -d cciss,15 [...] /dev/sdp'
#
#  Per definition (see man smartctl(8)), the maximum number of devices
#  supported by the cciss driver is 15, so the /dev/sdp is the "highest"
#  device accepted (p=15).
#
#  This is useful for certain HP RAID/HBA controllers that expose the block
#  devices they control as /dev/sdX, but still require '-d cciss,N' to be
#  present when used with smartctl.
#
#  At the bottom line, this saves you the extra commandline switch  plus at
#  the same time allows other tools to read out the SMART values without any
#  further configuration on their side (eg. the proxmox admin interface
#  showing SMART values).
#
#  To wrap the original smartctl binary using this script, rename the script
#  to /usr/sbin/smartctl.orig and use this script as a replacement, eg like
#  this:
#
#  $ mv /usr/sbin/smartctl /usr/sbin/smartctl.orig
#  $ cp /path/of/the/downloaded_wrapper/smartctl_cciss.sh /usr/sbin/smartctl
#  $ chmod 755 /usr/sbin/smartctl
#
#  Later updates of the smartmontools package will probably overwrite the
#  wrapper, so what you can do to prevent this is to make the in place
#  wrapper immutable like this:
#
#  $ chattr +i /usr/sbin/smartctl
#
#  ... but this may have some sideffects afterwards (eg. updates might
#  complain that they cannot update the now immutable file).
#
#  This is a little bit hackish, but it does the job well enough for me :)
 
SMARTCTL=/usr/sbin/smartctl.orig
OPTIONS=("$@")
 
# build up map
char_index=({a..p})
declare -A num_map
for((i=0; i < ${#char_index[*]}; ++i)); do
    num_map[${char_index[i]}]=$i
done
 
for((i=1; i<$#; ++i)); do
    device_letter="${OPTIONS[i]#/dev/sd}"
    # only proceed if the given device ends with [a-p]
    if [[ ! -z "${num_map[$device_letter]:-}" ]]; then
        cciss_device="-d cciss,${num_map[$device_letter]}"
        # add the "-d cciss,X" option to the list of options
        OPTIONS=($cciss_device "${OPTIONS[@]}")
    fi
done
 
exec $SMARTCTL "${OPTIONS[@]}"

I messed around and thought I got it working for all drives, but it didn't work. They looked like it, but it was just my tired confirmation bias thinking things worked.
 
Last edited:
I have the same issue with my DL80 G9 and H240 card. I got the info to report by using someone's (Udo Rader) smartctl wrapper. Honestly, I only understand about half of how it works, but it basically just replaces smartctl and you can get at least one HBA working. For some reason it only works with my first H240, the other one still shows unknown. If someone is smarter or has more time/motivation to modify this to make it work better for us, please do and repost here! I got it deep in some forum post. Hope it helps someone! Obviously credit goes to Udo Rader, the instructions to implement are in there.

Code:
#!/bin/bash
 
# -----------------------------------------------------------------
# Copyright (c) 2021 BestSolution.at EDV Systemhaus GmbH
# All Rights Reserved.
#
# This software is released under the terms of the
#
#            "GNU Affero General Public License"
#
# and may only be distributed and used under the terms of the
# mentioned license. You should have received a copy of the license
# along with this software product, if not you can download it from
# https://www.gnu.org/licenses/agpl-3.0.en.html
#
# Author: udo.rader@bestsolution.at
# -----------------------------------------------------------------
#
#  smartctl_cciss.sh
#
#  Wrapper that converts
#
#    '$ smartctl [...] /dev/sda' to '$ smartctl -d cciss,0 [...] /dev/sda'
#    '$ smartctl [...] /dev/sdb' to '$ smartctl -d cciss,1 [...] /dev/sdb'
#    '$ smartctl [...] /dev/sdc' to '$ smartctl -d cciss,2 [...] /dev/sdc'
#    ...
#    '$ smartctl [...] /dev/sdp' to '$ smartctl -d cciss,15 [...] /dev/sdp'
#
#  Per definition (see man smartctl(8)), the maximum number of devices
#  supported by the cciss driver is 15, so the /dev/sdp is the "highest"
#  device accepted (p=15).
#
#  This is useful for certain HP RAID/HBA controllers that expose the block
#  devices they control as /dev/sdX, but still require '-d cciss,N' to be
#  present when used with smartctl.
#
#  At the bottom line, this saves you the extra commandline switch  plus at
#  the same time allows other tools to read out the SMART values without any
#  further configuration on their side (eg. the proxmox admin interface
#  showing SMART values).
#
#  To wrap the original smartctl binary using this script, rename the script
#  to /usr/sbin/smartctl.orig and use this script as a replacement, eg like
#  this:
#
#  $ mv /usr/sbin/smartctl /usr/sbin/smartctl.orig
#  $ cp /path/of/the/downloaded_wrapper/smartctl_cciss.sh /usr/sbin/smartctl
#  $ chmod 755 /usr/sbin/smartctl
#
#  Later updates of the smartmontools package will probably overwrite the
#  wrapper, so what you can do to prevent this is to make the in place
#  wrapper immutable like this:
#
#  $ chattr +i /usr/sbin/smartctl
#
#  ... but this may have some sideffects afterwards (eg. updates might
#  complain that they cannot update the now immutable file).
#
#  This is a little bit hackish, but it does the job well enough for me :)
 
SMARTCTL=/usr/sbin/smartctl.orig
OPTIONS=("$@")
 
# build up map
char_index=({a..p})
declare -A num_map
for((i=0; i < ${#char_index[*]}; ++i)); do
    num_map[${char_index[i]}]=$i
done
 
for((i=1; i<$#; ++i)); do
    device_letter="${OPTIONS[i]#/dev/sd}"
    # only proceed if the given device ends with [a-p]
    if [[ ! -z "${num_map[$device_letter]:-}" ]]; then
        cciss_device="-d cciss,${num_map[$device_letter]}"
        # add the "-d cciss,X" option to the list of options
        OPTIONS=($cciss_device "${OPTIONS[@]}")
    fi
done
 
exec $SMARTCTL "${OPTIONS[@]}"

Edit: I just realized if all of them are "-d cciss,3" then just make the line
Code:
cciss_device="-d cciss,${num_map[$device_letter]}"
look like this
Code:
cciss_device="-d cciss,3"
And now all my drives on both of my HBAs work!
Thanks for this! worked perfectly. I only have a single HBA hooked up currently so I don't know if I'll have the same problem you did, however at least I'll know how to fix it.

It seems this would be easy for the proxmox devs to implement, so hopefully they do.
 
I have the same issue with my DL80 G9 and H240 card. I got the info to report by using someone's (Udo Rader) smartctl wrapper. Honestly, I only understand about half of how it works, but it basically just replaces smartctl and you can get at least one HBA working. For some reason it only works with my first H240, the other one still shows unknown. If someone is smarter or has more time/motivation to modify this to make it work better for us, please do and repost here! I got it deep in some forum post. Hope it helps someone! Obviously credit goes to Udo Rader, the instructions to implement are in there.

Code:
#!/bin/bash
 
# -----------------------------------------------------------------
# Copyright (c) 2021 BestSolution.at EDV Systemhaus GmbH
# All Rights Reserved.
#
# This software is released under the terms of the
#
#            "GNU Affero General Public License"
#
# and may only be distributed and used under the terms of the
# mentioned license. You should have received a copy of the license
# along with this software product, if not you can download it from
# https://www.gnu.org/licenses/agpl-3.0.en.html
#
# Author: udo.rader@bestsolution.at
# -----------------------------------------------------------------
#
#  smartctl_cciss.sh
#
#  Wrapper that converts
#
#    '$ smartctl [...] /dev/sda' to '$ smartctl -d cciss,0 [...] /dev/sda'
#    '$ smartctl [...] /dev/sdb' to '$ smartctl -d cciss,1 [...] /dev/sdb'
#    '$ smartctl [...] /dev/sdc' to '$ smartctl -d cciss,2 [...] /dev/sdc'
#    ...
#    '$ smartctl [...] /dev/sdp' to '$ smartctl -d cciss,15 [...] /dev/sdp'
#
#  Per definition (see man smartctl(8)), the maximum number of devices
#  supported by the cciss driver is 15, so the /dev/sdp is the "highest"
#  device accepted (p=15).
#
#  This is useful for certain HP RAID/HBA controllers that expose the block
#  devices they control as /dev/sdX, but still require '-d cciss,N' to be
#  present when used with smartctl.
#
#  At the bottom line, this saves you the extra commandline switch  plus at
#  the same time allows other tools to read out the SMART values without any
#  further configuration on their side (eg. the proxmox admin interface
#  showing SMART values).
#
#  To wrap the original smartctl binary using this script, rename the script
#  to /usr/sbin/smartctl.orig and use this script as a replacement, eg like
#  this:
#
#  $ mv /usr/sbin/smartctl /usr/sbin/smartctl.orig
#  $ cp /path/of/the/downloaded_wrapper/smartctl_cciss.sh /usr/sbin/smartctl
#  $ chmod 755 /usr/sbin/smartctl
#
#  Later updates of the smartmontools package will probably overwrite the
#  wrapper, so what you can do to prevent this is to make the in place
#  wrapper immutable like this:
#
#  $ chattr +i /usr/sbin/smartctl
#
#  ... but this may have some sideffects afterwards (eg. updates might
#  complain that they cannot update the now immutable file).
#
#  This is a little bit hackish, but it does the job well enough for me :)
 
SMARTCTL=/usr/sbin/smartctl.orig
OPTIONS=("$@")
 
# build up map
char_index=({a..p})
declare -A num_map
for((i=0; i < ${#char_index[*]}; ++i)); do
    num_map[${char_index[i]}]=$i
done
 
for((i=1; i<$#; ++i)); do
    device_letter="${OPTIONS[i]#/dev/sd}"
    # only proceed if the given device ends with [a-p]
    if [[ ! -z "${num_map[$device_letter]:-}" ]]; then
        cciss_device="-d cciss,${num_map[$device_letter]}"
        # add the "-d cciss,X" option to the list of options
        OPTIONS=($cciss_device "${OPTIONS[@]}")
    fi
done
 
exec $SMARTCTL "${OPTIONS[@]}"

Edit: I just realized if all of them are "-d cciss,3" then just make the line
Code:
cciss_device="-d cciss,${num_map[$device_letter]}"
look like this
Code:
cciss_device="-d cciss,3"
And now all my drives on both of my HBAs work!
I just installed my second h240 and I seem to be having the problem you did, however changing the line of the script to what you did doesn't help. How did you determine the cciss numbers for all of your drives?

Edit: I don't think your fix should work, it might show all the drives working, but it should be showing the same smart data for every drive because you set them all to show cciss 3. I don't know what the ultimate fix is, but I don't see why the current script doesn't work as is.

Edit2: so it seems that each HBA starts at cciss 0. So devices cciss 0-7 on HBA 1 correlate to sda-sdh and cciss 0-7 on HBA 2 correlate to sdi-sdp

I'm not sure how the script could be altered to make this work

Edit3: Here's my solution, it's ugly and probably a hack job but it works. Replace everything after udo rader's intro with the code below if you have more than one hba and aren't able to see smart details for drives sdi-sdp

Code:
SMARTCTL=/usr/sbin/smartctl.orig
OPTIONS=("$@")
 
# build up map for a-h
char_index=({a..h})
declare -A num_map
for((i=0; i < ${#char_index[*]}; ++i)); do
    num_map[${char_index[i]}]=$i
done
 
for((i=1; i<$#; ++i)); do
    device_letter="${OPTIONS[i]#/dev/sd}"
    # only proceed if the given device ends with [a-h]
    if [[ ! -z "${num_map[$device_letter]:-}" ]]; then
        cciss_device="-d cciss,${num_map[$device_letter]}"
        # add the "-d cciss,X" option to the list of options
        OPTIONS=($cciss_device "${OPTIONS[@]}")
    fi

done

# rebuid map for i-p
# this is certainly bad practice

char_index=({i..p})
declare -A num_map
for((i=0; i < ${#char_index[*]}; ++i)); do
    num_map[${char_index[i]}]=$i
done
 
for((i=1; i<$#; ++i)); do
    device_letter="${OPTIONS[i]#/dev/sd}"
    # only proceed if the given device ends with [i-p]
    if [[ ! -z "${num_map[$device_letter]:-}" ]]; then
        cciss_device="-d cciss,${num_map[$device_letter]}"
        # add the "-d cciss,X" option to the list of options
        OPTIONS=($cciss_device "${OPTIONS[@]}")
    fi

done
 
exec $SMARTCTL "${OPTIONS[@]}"
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!