Issue after update/upgrade I/O initramfs.img

Malshtur · Nov 20, 2022

Hello there,

First post here and I wish it was under better circumstances. So I updated and upgraded my pve host tonight using pveupdate and pveupgrade commands. An issue was raised with theses packages :

Code:

 proxmox-backup-file-restore
 libpve-storage-perl
 pve-container
 pve-manager
 qemu-server
 libpve-guest-common-perl
 pve-ha-manager

So, I have tried few light things but nothing worked so far. One weird thing I saw is about an I/O error while reading an initramfs.img. That's weird to me cause the ssd concerned is pretty new a few months at most.

I'll now provide some informations I have seen asked around here through my readings.

pveversion -v

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.74-1-pve)
pve-manager: not correctly installed (running version: 7.2-14/65898fbc)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-7
libpve-guest-common-perl: not correctly installed
libpve-http-server-perl: 4.1-5
libpve-storage-perl: not correctly installed
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: not correctly installed
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.2
pve-cluster: 7.2-3
pve-container: not correctly installed
pve-docs: 7.2-3
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: not correctly installed
pve-i18n: 2.7-2
pve-qemu-kvm: 7.1.0-3
pve-xtermjs: 4.16.0-1
qemu-server: not correctly installed
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

apt list --installed | grep linux-image

Code:

root@pve:~# apt list --installed | grep linux-image

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

nothing more here...

grep -r '' /etc/apt/sources.list*

Code:

/etc/apt/sources.list:deb http://ftp.fr.debian.org/debian bullseye main contrib
/etc/apt/sources.list:
/etc/apt/sources.list:deb http://ftp.fr.debian.org/debian bullseye-updates main contrib
/etc/apt/sources.list:
/etc/apt/sources.list:# security updates
/etc/apt/sources.list:deb http://security.debian.org bullseye-security main contrib
/etc/apt/sources.list:
/etc/apt/sources.list:deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription
/etc/apt/sources.list:
/etc/apt/sources.list.d/pve-enterprise.list.dpkg-new:deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise
/etc/apt/sources.list.d/pve-enterprise.list:# deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise
/etc/apt/sources.list.d/pve-enterprise.list:

apt update

Code:

Hit:1 http://ftp.fr.debian.org/debian bullseye InRelease
Hit:2 http://ftp.fr.debian.org/debian bullseye-updates InRelease                                 
Hit:3 http://security.debian.org bullseye-security InRelease                                     
Hit:4 http://download.proxmox.com/debian/pve bullseye InRelease
Reading package lists... Done         
Building dependency tree... Done
Reading state information... Done
All packages are up to date.

apt full-upgrade

Code:

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
7 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] Y
Setting up proxmox-backup-file-restore (2.2.7-1) ...
Updating file-restore initramfs...
cp: error reading '/usr/lib/x86_64-linux-gnu/proxmox-backup/file-restore/initramfs.img': Input/output error
dpkg: error processing package proxmox-backup-file-restore (--configure):
 installed proxmox-backup-file-restore package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of libpve-storage-perl:
 libpve-storage-perl depends on proxmox-backup-file-restore; however:
  Package proxmox-backup-file-restore is not configured yet.

dpkg: error processing package libpve-storage-perl (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-container:
 pve-container depends on libpve-storage-perl (>= 7.2-10); however:
  Package libpve-storage-perl is not configured yet.

dpkg: error processing package pve-container (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-manager:
 pve-manager depends on libpve-storage-perl (>= 7.2-12); however:
  Package libpve-storage-perl is not configured yet.
 pve-manager depends on pve-container (>= 4.0-9); however:
  Package pve-container is not configured yet.

dpkg: error processing package pve-manager (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of qemu-server:
 qemu-server depends on libpve-storage-perl (>= 7.2-10); however:
  Package libpve-storage-perl is not configured yet.

dpkg: error processing package qemu-server (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of libpve-guest-common-perl:
 libpve-guest-common-perl depends on libpve-storage-perl (>= 7.0-14); however:
  Package libpve-storage-perl is not configured yet.

dpkg: error processing package libpve-guest-common-perl (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-ha-manager:
 pve-ha-manager depends on pve-container; however:
  Package pve-container is not configured yet.
 pve-ha-manager depends on qemu-server (>= 6.0-15); however:
  Package qemu-server is not configured yet.

dpkg: error processing package pve-ha-manager (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 proxmox-backup-file-restore
 libpve-storage-perl
 pve-container
 pve-manager
 qemu-server
 libpve-guest-common-perl
 pve-ha-manager
E: Sub-process /usr/bin/dpkg returned an error code (1)

That's the problem I talked about before.

And at last, syslog

Code:

Nov 20 01:57:33 pve pvedaemon[2618]: <root@pam> successful auth for user 'root@pam'
Nov 20 01:57:33 pve login[7839]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)
Nov 20 01:57:33 pve systemd-logind[2146]: New session 3 of user root.
Nov 20 01:57:33 pve systemd[1]: Started Session 3 of user root.
Nov 20 01:57:33 pve login[7844]: ROOT LOGIN  on '/dev/pts/0'
Nov 20 01:57:48 pve kernel: ata7.00: exception Emask 0x0 SAct 0x1000 SErr 0x0 action 0x0
Nov 20 01:57:48 pve kernel: ata7.00: irq_stat 0x40000008
Nov 20 01:57:48 pve kernel: ata7.00: failed command: READ FPDMA QUEUED
Nov 20 01:57:48 pve kernel: ata7.00: cmd 60/08:60:18:26:26/00:00:01:00:00/40 tag 12 ncq dma 4096 in
         res 51/40:08:18:26:26/00:00:01:00:00/00 Emask 0x409 (media error) <F>
Nov 20 01:57:48 pve kernel: ata7.00: status: { DRDY ERR }
Nov 20 01:57:48 pve kernel: ata7.00: error: { UNC }
Nov 20 01:57:49 pve kernel: ata7.00: configured for UDMA/133
Nov 20 01:57:49 pve kernel: sd 6:0:0:0: [sdc] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Nov 20 01:57:49 pve kernel: sd 6:0:0:0: [sdc] tag#12 Sense Key : Medium Error [current]
Nov 20 01:57:49 pve kernel: sd 6:0:0:0: [sdc] tag#12 Add. Sense: Unrecovered read error - auto reallocate failed
Nov 20 01:57:49 pve kernel: sd 6:0:0:0: [sdc] tag#12 CDB: Read(10) 28 00 01 26 26 18 00 00 08 00
Nov 20 01:57:49 pve kernel: blk_update_request: I/O error, dev sdc, sector 19277336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 20 01:57:49 pve kernel: ata7: EH complete
Nov 20 01:57:50 pve kernel: ata7.00: exception Emask 0x0 SAct 0x200000 SErr 0x0 action 0x0
Nov 20 01:57:50 pve kernel: ata7.00: irq_stat 0x40000008
Nov 20 01:57:50 pve kernel: ata7.00: failed command: READ FPDMA QUEUED
Nov 20 01:57:50 pve kernel: ata7.00: cmd 60/08:a8:18:26:26/00:00:01:00:00/40 tag 21 ncq dma 4096 in
         res 51/40:08:18:26:26/00:00:01:00:00/00 Emask 0x409 (media error) <F>
Nov 20 01:57:50 pve kernel: ata7.00: status: { DRDY ERR }
Nov 20 01:57:50 pve kernel: ata7.00: error: { UNC }
Nov 20 01:57:50 pve kernel: ata7.00: configured for UDMA/133
Nov 20 01:57:50 pve kernel: sd 6:0:0:0: [sdc] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=1s
Nov 20 01:57:50 pve kernel: sd 6:0:0:0: [sdc] tag#21 Sense Key : Medium Error [current]
Nov 20 01:57:50 pve kernel: sd 6:0:0:0: [sdc] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
Nov 20 01:57:50 pve kernel: sd 6:0:0:0: [sdc] tag#21 CDB: Read(10) 28 00 01 26 26 18 00 00 08 00
Nov 20 01:57:50 pve kernel: blk_update_request: I/O error, dev sdc, sector 19277336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 20 01:57:50 pve kernel: ata7: EH complete
Nov 20 01:57:51 pve kernel: ata7.00: exception Emask 0x0 SAct 0x4800 SErr 0x0 action 0x0
Nov 20 01:57:51 pve kernel: ata7.00: irq_stat 0x40000008
Nov 20 01:57:51 pve kernel: ata7.00: failed command: READ FPDMA QUEUED
Nov 20 01:57:51 pve kernel: ata7.00: cmd 60/08:70:18:26:26/00:00:01:00:00/40 tag 14 ncq dma 4096 in
         res 51/40:08:18:26:26/00:00:01:00:00/00 Emask 0x409 (media error) <F>
Nov 20 01:57:51 pve kernel: ata7.00: status: { DRDY ERR }
Nov 20 01:57:51 pve kernel: ata7.00: error: { UNC }
Nov 20 01:57:51 pve kernel: ata7.00: configured for UDMA/133
Nov 20 01:57:51 pve kernel: sd 6:0:0:0: [sdc] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=1s
Nov 20 01:57:51 pve kernel: sd 6:0:0:0: [sdc] tag#14 Sense Key : Medium Error [current]
Nov 20 01:57:51 pve kernel: sd 6:0:0:0: [sdc] tag#14 Add. Sense: Unrecovered read error - auto reallocate failed
Nov 20 01:57:51 pve kernel: sd 6:0:0:0: [sdc] tag#14 CDB: Read(10) 28 00 01 26 26 18 00 00 08 00
Nov 20 01:57:51 pve kernel: blk_update_request: I/O error, dev sdc, sector 19277336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 20 01:57:51 pve kernel: ata7: EH complete

Well it looks like dev sdc as a issue but I don't understand as it's seen as fine in smart.

Thanks for your time.

leesteken · Nov 20, 2022

Malshtur said:
One weird thing I saw is about an I/O error while reading an initramfs.img. That's weird to me cause the ssd concerned is pretty new a few months at most.

Do you know about the bathtub curve? Also, Proxmox is known to wear-out non-enterprise SSDs quickly.

Malshtur said:

Code:

NNov 20 01:57:49 pve kernel: sd 6:0:0:0: [sdc] tag#12 Sense Key : Medium Error [current]
Nov 20 01:57:49 pve kernel: sd 6:0:0:0: [sdc] tag#12 Add. Sense: Unrecovered read error - auto reallocate failed
Nov 20 01:57:49 pve kernel: sd 6:0:0:0: [sdc] tag#12 CDB: Read(10) 28 00 01 26 26 18 00 00 08 00
Nov 20 01:57:49 pve kernel: blk_update_request: I/O error, dev sdc, sector 19277336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 20 01:57:49 pve kernel: ata7: EH complete

Well it looks like dev sdc as a issue but I don't understand as it's seen as fine in smart.

SMART isn't perfect but I expect some counters to have increased by these errors that appear very real and have already broken parts of your Proxmox installation. It does not look like a bad cable or driver issue. You could run a long (non-destructive) self-test with smartctl but if there is anything on the drive that want to preserve, you better copy it now as errors tend to increase.

Malshtur · Nov 20, 2022

leesteken said:
Do you know about the bathtub curve? Also, Proxmox is known to wear-out non-enterprise SSDs quickly.

SMART isn't perfect but I expect some counters to have increased by these errors that appear very real and have already broken parts of your Proxmox installation. It does not look like a bad cable or driver issue. You could run a long (non-destructive) self-test with smartctl but if there is anything on the drive that want to preserve, you better copy it now as errors tend to increase.

I have enterprise SSD in a ZFS array and Proxmox host is on a separate ssd with low logging. Only had this at the moment.

smartctl long test

Code:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.74-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Silicon Motion based SSDs
Device Model:     TS240GSSD220S
Serial Number:    G878021290
LU WWN Device Id: 5 7c3548 1d59110aa
Firmware Version: VD0R3A04
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Nov 20 14:20:18 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    1) seconds.
Offline data collection
capabilities:                    (0x59) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       8
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       5877
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       13
160 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
161 Valid_Spare_Block_Cnt   0x0032   100   100   000    Old_age   Always       -       51
163 Initial_Bad_Block_Count 0x0032   100   100   000    Old_age   Always       -       9
164 Total_Erase_Count       0x0032   100   100   000    Old_age   Always       -       1270
165 Max_Erase_Count         0x0032   100   100   000    Old_age   Always       -       2
166 Min_Erase_Count         0x0032   100   100   000    Old_age   Always       -       0
167 Average_Erase_Count     0x0032   100   100   000    Old_age   Always       -       1
168 Max_Erase_Count_of_Spec 0x0032   100   100   000    Old_age   Always       -       3000
169 Remaining_Lifetime_Perc 0x0032   100   100   000    Old_age   Always       -       99
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       4
194 Temperature_Celsius     0x0022   100   100   030    Old_age   Always       -       31
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       407752
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
200 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       249
201 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x0032   100   100   000    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   000    Old_age   Always       -       50
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       9963
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       33683
245 TLC_Writes_32MiB        0x0032   100   100   000    Old_age   Always       -       12390
250 Read_Error_Retry_Rate   0x0032   100   100   000    Old_age   Always       -       10699

SMART Error Log Version: 1
Warning: ATA error count 127 inconsistent with error log pointer 5

ATA Error Count: 127 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 127 occurred at disk power-on lifetime: 5877 hours (244 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 40 10 e3 44 40  Error: UNC at LBA = 0x0044e310 = 4514576

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 40 10 e3 44 40 08  22d+20:23:18.334  READ FPDMA QUEUED
  61 08 88 70 18 ac 40 08  22d+20:23:18.283  WRITE FPDMA QUEUED
  61 08 78 48 18 ac 40 08  22d+20:23:18.283  WRITE FPDMA QUEUED
  61 10 70 70 60 e0 40 08  22d+20:23:18.283  WRITE FPDMA QUEUED
  61 08 80 e8 48 7c 40 08  22d+20:23:18.283  WRITE FPDMA QUEUED

Error 126 occurred at disk power-on lifetime: 5877 hours (244 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 e8 10 e3 44 40  Error: UNC

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 30 08 00 a0 08  22d+20:23:17.149  READ LOG EXT
  61 08 b0 90 16 ac 40 08  22d+20:23:17.098  WRITE FPDMA QUEUED
  61 10 a8 88 17 ac 40 08  22d+20:23:17.098  WRITE FPDMA QUEUED
  2f 00 01 30 08 00 a0 08  22d+20:23:16.190  READ LOG EXT
  2f 00 01 30 00 00 a0 08  22d+20:23:16.151  READ LOG EXT

Error 125 occurred at disk power-on lifetime: 5877 hours (244 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 a0 10 e3 44 40  Error: UNC at LBA = 0x0044e310 = 4514576

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 a0 10 e3 44 40 08  22d+20:23:15.968  READ FPDMA QUEUED
  61 08 98 e8 15 ac 40 08  22d+20:23:15.912  WRITE FPDMA QUEUED
  61 08 90 90 15 ac 40 08  22d+20:23:15.912  WRITE FPDMA QUEUED
  61 08 88 68 15 ac 40 08  22d+20:23:15.912  WRITE FPDMA QUEUED
  61 08 80 30 15 ac 40 08  22d+20:23:15.912  WRITE FPDMA QUEUED

Error 124 occurred at disk power-on lifetime: 5877 hours (244 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 20 70 5f 55 40  Error: UNC at LBA = 0x00555f70 = 5594992

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 20 70 5f 55 40 08  22d+20:23:14.728  READ FPDMA QUEUED
  61 08 28 d0 f2 22 40 08  22d+20:23:14.678  WRITE FPDMA QUEUED
  60 00 10 00 08 10 40 08  22d+20:23:14.678  READ FPDMA QUEUED
  2f 00 01 30 08 00 a0 08  22d+20:23:13.790  READ LOG EXT
  2f 00 01 30 00 00 a0 08  22d+20:23:13.756  READ LOG EXT

Error 123 occurred at disk power-on lifetime: 5877 hours (244 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 98 70 5f 55 40  Error: UNC

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 30 08 00 a0 08  22d+20:23:13.510  READ LOG EXT
  2f 00 01 30 08 00 a0 08  22d+20:23:12.550  READ LOG EXT
  2f 00 01 30 00 00 a0 08  22d+20:23:12.511  READ LOG EXT
  2f 00 01 00 00 00 a0 08  22d+20:23:12.476  READ LOG EXT
  ef 10 02 00 00 00 a0 08  22d+20:23:12.437  SET FEATURES [Enable SATA feature]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      5877         -
# 2  Short offline       Completed without error       00%      5876         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I'm not quite sure to understand because there are errors but test was fine ?

leesteken · Nov 20, 2022

The drive claims to have been on for about eight months in total (not necessarily continuous) and it did register 127 errors it encountered. So it's not an artifact of being too slow for ZFS or anything like that. Did you also use ZFS for Proxmox? If so, what is the output of zpool status rpool before and after a zpool scrub rpool?
I don't know how to interpreted the various values for a Transcend SSD220 240GB exactly. It claims to still have 99% of life/wear to go but the errors look like some of the flash memory is broken. Maybe the manufacturer or seller can help you (within the warranty period)? Some other people here a bound to have more experience with this.

Malshtur · Nov 21, 2022

leesteken said:
The drive claims to have been on for about eight months in total (not necessarily continuous) and it did register 127 errors it encountered. So it's not an artifact of being too slow for ZFS or anything like that. Did you also use ZFS for Proxmox? If so, what is the output of zpool status rpool before and after a zpool scrub rpool?
I don't know how to interpreted the various values for a Transcend SSD220 240GB exactly. It claims to still have 99% of life/wear to go but the errors look like some of the flash memory is broken. Maybe the manufacturer or seller can help you (within the warranty period)? Some other people here a bound to have more experience with this.

8 months with very very low activity. This disk wasn't used in zfs.

This is only the os disk so I formated it and ran some tests no problem so far. Well, I have a second one, I'll build a mirror with it and reinstall on the raid1 array. I'll let you know if something goes wrong.

Search

Search

Issue after update/upgrade I/O initramfs.img

Malshtur

New Member

leesteken

Distinguished Member

Malshtur

New Member

leesteken

Distinguished Member

Malshtur

New Member