As I am running a apt-get dist-upgrade, I am getting i/o errors. It's always complaining about sector 76540200. The Smart information looks pretty good, aside from the crazy amount of errors about that single sector. I suppose I'm surprised by the idea that this is a failure, but I can accept it if it is.
The machine still boots and seems okay. I just can't install anything without getting errors.
Below is some of the info I have dug up:
At the console for the system I am seeing
blk_update_request: I/O error, dev nvme0n1, sector 76540200
Running smartctl reports:
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.4.67-1-pve] (local build)
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPEKKW512G7
Serial Number: BTPY65110PKZ512F
Firmware Version: PSF109C
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Nov 20 00:41:14 2017 MST
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x001e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 5 5
1 + 4.60W - - 1 1 1 1 30 30
2 + 3.80W - - 2 2 2 2 30 30
3 - 0.0700W - - 3 3 3 3 10000 300
4 - 0.0050W - - 4 4 4 4 2000 10000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 55 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 388,058 [198 GB]
Data Units Written: 12,693,111 [6.49 TB]
Host Read Commands: 117,766,680
Host Write Commands: 377,240,502
Controller Busy Time: 2,555
Power Cycles: 4
Power On Hours: 4,472
Unsafe Shutdowns: 1
Media and Data Integrity Errors: 974,015
Error Information Log Entries: 974,015
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 974016 4 0x000c 0x0281 - 76540200 1 -
1 974015 4 0x000c 0x0281 - 76540200 1 -
2 974014 4 0x000c 0x0281 - 76540200 1 -
3 974013 4 0x000c 0x0281 - 76540200 1 -
4 974012 4 0x000c 0x0281 - 76540200 1 -
5 974011 4 0x000c 0x0281 - 76540200 1 -
6 974010 4 0x000c 0x0281 - 76540200 1 -
7 974009 4 0x000c 0x0281 - 76540200 1 -
8 974008 4 0x000c 0x0281 - 76540200 1 -
9 974007 4 0x000c 0x0281 - 76540200 1 -
10 974006 4 0x000c 0x0281 - 76540200 1 -
11 974005 4 0x000c 0x0281 - 76540200 1 -
12 974004 4 0x000c 0x0281 - 76540200 1 -
13 974003 4 0x000c 0x0281 - 76540200 1 -
14 974002 4 0x000c 0x0281 - 76540200 1 -
15 974001 4 0x000c 0x0281 - 76540200 1 -
... (48 entries not shown)
The machine still boots and seems okay. I just can't install anything without getting errors.
Below is some of the info I have dug up:
At the console for the system I am seeing
blk_update_request: I/O error, dev nvme0n1, sector 76540200
Running smartctl reports:
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.4.67-1-pve] (local build)
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPEKKW512G7
Serial Number: BTPY65110PKZ512F
Firmware Version: PSF109C
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Nov 20 00:41:14 2017 MST
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x001e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 5 5
1 + 4.60W - - 1 1 1 1 30 30
2 + 3.80W - - 2 2 2 2 30 30
3 - 0.0700W - - 3 3 3 3 10000 300
4 - 0.0050W - - 4 4 4 4 2000 10000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 55 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 388,058 [198 GB]
Data Units Written: 12,693,111 [6.49 TB]
Host Read Commands: 117,766,680
Host Write Commands: 377,240,502
Controller Busy Time: 2,555
Power Cycles: 4
Power On Hours: 4,472
Unsafe Shutdowns: 1
Media and Data Integrity Errors: 974,015
Error Information Log Entries: 974,015
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 974016 4 0x000c 0x0281 - 76540200 1 -
1 974015 4 0x000c 0x0281 - 76540200 1 -
2 974014 4 0x000c 0x0281 - 76540200 1 -
3 974013 4 0x000c 0x0281 - 76540200 1 -
4 974012 4 0x000c 0x0281 - 76540200 1 -
5 974011 4 0x000c 0x0281 - 76540200 1 -
6 974010 4 0x000c 0x0281 - 76540200 1 -
7 974009 4 0x000c 0x0281 - 76540200 1 -
8 974008 4 0x000c 0x0281 - 76540200 1 -
9 974007 4 0x000c 0x0281 - 76540200 1 -
10 974006 4 0x000c 0x0281 - 76540200 1 -
11 974005 4 0x000c 0x0281 - 76540200 1 -
12 974004 4 0x000c 0x0281 - 76540200 1 -
13 974003 4 0x000c 0x0281 - 76540200 1 -
14 974002 4 0x000c 0x0281 - 76540200 1 -
15 974001 4 0x000c 0x0281 - 76540200 1 -
... (48 entries not shown)