Failed/Bad Drive?

jShomp

New Member
Mar 4, 2023
11
1
3
My VM storage drive(SSD) failed earlier today during a rather intensive ERP software installation on Windows 2019. A full reboot of Proxmox did not bring the drive back. I replaced the SATA cable and everything seemed OK, then immediately a parallel installation of two Windows 2019 VMs completed it crashed again. A snippet of the errors are below, and the errors were similar during the first crash. Is this likely a failed drive?

Code:
Apr 11 00:26:28 pve kernel: blk_update_request: I/O error, dev sdj, sector 670307456 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#11 CDB: Read(10) 28 00 1e a6 10 70 00 00 08 00
Apr 11 00:26:28 pve kernel: blk_update_request: I/O error, dev sdj, sector 514199664 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 11 00:26:28 pve kernel: Buffer I/O error on dev dm-8, logical block 3981582, async page read
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#7 CDB: Read(10) 28 00 2a 2a 7d d8 00 00 08 00
Apr 11 00:26:28 pve kernel: blk_update_request: I/O error, dev sdj, sector 707427800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 11 00:26:28 pve kernel: Buffer I/O error on dev dm-10, logical block 9785019, async page read
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#12 CDB: Read(10) 28 00 1d 2e 4a 90 00 00 08 00
Apr 11 00:26:28 pve kernel: blk_update_request: I/O error, dev sdj, sector 489573008 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 11 00:26:28 pve kernel: Buffer I/O error on dev dm-8, logical block 903250, async page read
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#13 CDB: Read(10) 28 00 1e a6 10 70 00 00 08 00
Apr 11 00:26:28 pve kernel: blk_update_request: I/O error, dev sdj, sector 514199664 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 11 00:26:28 pve kernel: Buffer I/O error on dev dm-8, logical block 3981582, async page read
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:26:28 pve kernel: sd 1:0:0:0: [sdj] tag#14 CDB: Read(10) 28 00 1d 2e 4a 90 00 00 08 00
Apr 11 00:26:28 pve kernel: blk_update_request: I/O error, dev sdj, sector 489573008 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 11 00:26:28 pve kernel: Buffer I/O error on dev dm-8, logical block 903250, async page read
Apr 11 00:26:28 pve kernel: Buffer I/O error on dev dm-8, logical block 1941167, async page read
Apr 11 00:26:28 pve kernel: Buffer I/O error on dev dm-8, logical block 1020532, async page read

................................

Apr 11 00:44:41 pve pve_exporter[2127]: 127.0.0.1 - - [11/Apr/2023 00:44:41] "GET /pve HTTP/1.1" 200 -
Apr 11 00:44:41 pve kernel: sd 1:0:0:0: [sdj] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:44:41 pve kernel: sd 1:0:0:0: [sdj] tag#23 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Apr 11 00:44:41 pve kernel: blk_update_request: I/O error, dev sdj, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 11 00:44:41 pve kernel: sd 1:0:0:0: [sdj] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:44:41 pve kernel: sd 1:0:0:0: [sdj] tag#24 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Apr 11 00:44:41 pve kernel: blk_update_request: I/O error, dev sdj, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 11 00:44:41 pve pvestatd[2583]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Apr 11 00:44:44 pve pvedaemon[28149]: <root@pam> end task UPID:pve:00006F16:000C4689:6434E5B8:vncproxy:100:root@pam: OK
Apr 11 00:44:44 pve kernel: sd 1:0:0:0: [sdj] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:44:44 pve kernel: sd 1:0:0:0: [sdj] tag#14 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Apr 11 00:44:44 pve kernel: blk_update_request: I/O error, dev sdj, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 11 00:44:44 pve kernel: sd 1:0:0:0: [sdj] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:44:44 pve kernel: sd 1:0:0:0: [sdj] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Apr 11 00:44:44 pve kernel: blk_update_request: I/O error, dev sdj, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 11 00:44:44 pve pvedaemon[28149]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Apr 11 00:44:51 pve kernel: sd 1:0:0:0: [sdj] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:44:51 pve kernel: sd 1:0:0:0: [sdj] tag#23 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Apr 11 00:44:51 pve kernel: blk_update_request: I/O error, dev sdj, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 11 00:44:51 pve kernel: sd 1:0:0:0: [sdj] tag#25 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Apr 11 00:44:51 pve kernel: sd 1:0:0:0: [sdj] tag#25 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Apr 11 00:44:51 pve kernel: blk_update_request: I/O error, dev sdj, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 11 00:44:52 pve pvestatd[2583]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
 
Last edited:
Hi,

Yes, looks like the Hard Disk error, may you provide us with the smartctl `smartctl -a /dev/sdj` output?
 
Hi,

Yes, looks like the Hard Disk error, may you provide us with the smartctl `smartctl -a /dev/sdj` output?
Thank you, here you go. Probably a stupid question but is there any limitations to how many Guests can run simultaneously on a single drive? Likely a coincidence but it didn't start happening until I was running 3+ Windows 2019 Servers, all with 4 cores(host type) and 16GB RAM per Server. Total amount of Cores/RAM on this PVE Server is 32/128GB. Each Guest has 70GB for the install partition, and a separate 100GB software partition.

Code:
root@pve:~# smartctl -a /dev/sdj
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.74-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT2000MX500SSD1
Serial Number:    2301E699A677
LU WWN Device Id: 5 00a075 1e699a677
Firmware Version: M3CR045
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Apr 11 10:48:35 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1006
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       40
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   000    Old_age   Always       -       6
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       5
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       124
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   075   057   000    Old_age   Always       -       25 (Min/Max 0/43)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       0
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       9754459062
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       86902819
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42821017

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:
Hi,

Thank you for the output of smartctl!


SMART overall-health self-assessment test result: PASSED
The disk is *PASED*, and the output of the syslogs mentioned that there is an issue with hard disk. Maybe from the SATA cable? Or may you please check for firmware updates.

Can you do the following:
1. Run the smartctl with long flag
Bash:
smartctl -t long /dev/sdj
2. re-run the first command:
Code:
smartctl -a /dev/sdj
This may reveal any issue not visible in the initial SMART data.


Regarding the last question, I would check the above first, because sometimes the I/O might have an issue with the performance of the VMs. However, please post the output of the VM config qm config <VMID>
 
Hi,

Thank you for the output of smartctl!



The disk is *PASED*, and the output of the syslogs mentioned that there is an issue with hard disk. Maybe from the SATA cable? Or may you please check for firmware updates.

Can you do the following:
1. Run the smartctl with long flag
Bash:
smartctl -t long /dev/sdj
2. re-run the first command:
Code:
smartctl -a /dev/sdj
This may reveal any issue not visible in the initial SMART data.


Regarding the last question, I would check the above first, because sometimes the I/O might have an issue with the performance of the VMs. However, please post the output of the VM config qm config <VMID>
Sorry for taking so long, does this give you what you need? I also switched all the Windows VM drives to SCSI from SATA to see if that would help. I thought it had until I had another failure today. Also confirmed the SSD is using the latest firmware.

Code:
root@pve:~# smartctl -a /dev/sdj
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.104-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT2000MX500SSD1
Serial Number:    2301E699A677
LU WWN Device Id: 5 00a075 1e699a677
Firmware Version: M3CR045
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr 16 14:51:36 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1129
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       42
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   000    Old_age   Always       -       8
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       7
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       124
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   075   057   000    Old_age   Always       -       25 (Min/Max 0/43)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       0
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       11356292927
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       106197618
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       55508222

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 2

ATA Error Count: 0
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error -1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 ec 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  c8 00 00 00 00 00 00 00      00:00:00.000  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1125         -
# 2  Extended offline    Completed without error       00%      1006         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:
root@pve:~# qm config 400
agent: 1
boot: order=scsi0;scsi1;net0;ide2
cores: 4
cpu: host
ide2: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-i440fx-7.1
memory: 16384
meta: creation-qemu=7.1.0,ctime=1681184472
name: WINDB01
net0: virtio=22:BF:26:72:FE:C2,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: vmdata:vm-400-disk-0,cache=writeback,discard=on,iothread=1,size=70G,ssd=1
scsi1: vmdata:vm-400-disk-1,cache=writeback,discard=on,iothread=1,size=100G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4f7534a9-25ff-4d40-98b7-6314c47cd9bd
sockets: 1
vmgenid: 062ffce0-d0d1-44f0-8f99-97b27c7fe713

root@pve:~# qm config 401
boot: order=scsi0;scsi1;net0;ide2
cores: 4
cpu: host
ide2: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-i440fx-7.1
memory: 16384
meta: creation-qemu=7.1.0,ctime=1679091082
name: WINDEP01
net0: virtio=32:A2:0B:FE:AF:85,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: vmdata:vm-401-disk-0,cache=writeback,discard=on,iothread=1,size=70G,ssd=1
scsi1: vmdata:vm-401-disk-1,cache=writeback,discard=on,iothread=1,size=100G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=3396675d-6754-457b-9621-28787635bb43
sockets: 1
vmgenid: 66bfd8cb-aeda-4ea8-8abb-a103765b8c5c

root@pve:~# qm config 402
agent: 1
boot: order=scsi0;scsi1;net0;ide2
cores: 4
cpu: host
ide2: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-i440fx-7.1
memory: 16384
meta: creation-qemu=7.1.0,ctime=1681184472
name: WINENT01
net0: virtio=B6:42:BC:37:1B:1A,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: vmdata:vm-402-disk-0,cache=writeback,discard=on,iothread=1,size=70G,ssd=1
scsi1: vmdata:vm-402-disk-1,cache=writeback,discard=on,iothread=1,size=100G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=298842f2-6cba-4d7f-9083-e91ebfe247e3
sockets: 1
vmgenid: c23f35be-a71d-4cb5-8f6d-453e8c726586

root@pve:~# qm config 403
agent: 1
boot: order=scsi0;scsi1;net0;ide3
cores: 4
cpu: host
ide3: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-i440fx-7.1
memory: 16384
meta: creation-qemu=7.1.0,ctime=1681184472
name: WINJAS01
net0: virtio=82:22:70:40:A5:A7,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: vmdata:vm-403-disk-0,cache=writeback,discard=on,iothread=1,size=70G,ssd=1
scsi1: vmdata:vm-403-disk-2,cache=writeback,discard=on,iothread=1,size=50G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=2ff45b06-f450-4c41-b3e5-7bc6b2236441
sockets: 1
vmgenid: eac4ad0c-2e31-482e-ad56-65b318dcc57e

root@pve:~# qm config 410
agent: 1
boot: order=scsi0;net0;ide3
cores: 4
cpu: host
ide3: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-i440fx-7.1
memory: 16384
meta: creation-qemu=7.1.0,ctime=1681184472
name: WINFAT01
net0: virtio=6A:F4:6C:6A:4A:29,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: vmdata:vm-410-disk-0,cache=writeback,discard=on,iothread=1,size=70G,ssd=1
scsi1: vmdata:vm-410-disk-1,cache=writeback,discard=on,iothread=1,size=50G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=af291454-ec84-4d75-a8d8-fca76e7458c4
sockets: 1
vmgenid: 84276e42-bb68-42f5-a8af-9fdd71fd405f
 
Last edited:
Thank you for the output!

The good thing is that the output of the smartctl is good and passed which means the issue is not related to the hard disk.

Can you check the following as well?:

- Check/change the SATA cable for the `/dev/sdj`?
- See if the output of the following command if there is any warning/error on the mentioned Disk.
Bash:
dmesg  | grep '/dev/sdj'
- Can you do a backup on another storage and see if works without any IO issue?
 
- Check/change the SATA cable for the `/dev/sdj`?
That was actually the first thing I tried

dmesg | grep '/dev/sdj'
No output with that command

Can you do a backup on another storage and see if works without any IO issue?
I actually bought a new SSD yesterday and started that process.

I'm also wondering if it could be a RAM issue? I'm using memory that is on my mobo's QVL list, and using speed it's rated for(3600) w/XMP2, but I dropped it down to 3200 to see if that helps at all.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!