PBS 2.2-3 PBS EIO I/O Error

Jarvar

Well-Known Member
Aug 27, 2019
317
10
58
Jun 20 07:31:21 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:31:21 pbs002 prox^Z

repeated error showing in /var/log/syslog
Also in the shell with a USB External Drive, eventually get blk_updadate_request I/O Error a few times and the drive does not show connected.
I have 3 different PBS setup the same way without error.
Some posts have indicated this could be a cable, hardware, drive failure issue?
Anybody had any experience with this?
Thank you.
 
yes, an I/O Error is usually a hardware problem (disk/cable/controller/etc) you can check the output of 'dmesg' which could contain further info
 
@dcsapak this is the part of the dmesg command, let me know how far back I should post to find useful information.

se [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0
[ 3.843622] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
[ 3.846770] Console: switching to colour dummy device 80x25
[ 3.846993] [drm] Found bochs VGA, ID 0xb0c0.
[ 3.846996] [drm] Framebuffer size 16384 kB @ 0xfc000000, mmio @ 0xfea50000.
[ 3.848551] [drm] Found EDID data blob.
[ 3.849512] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on mi nor 0
[ 3.851191] fbcon: bochs-drmdrmfb (fb0) is primary device
[ 3.888535] Console: switching to colour frame buffer device 128x48
[ 3.890344] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer de vice
[ 3.939593] ZFS: Loaded module v2.1.4-pve1, ZFS pool version 5000, ZFS filesy stem version 5
[ 4.158142] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 4.382499] random: dbus-daemon: uninitialized urandom read (12 bytes read)
[ 8.880985] random: crng init done
[41080.364835] usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd
[41080.386257] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverb yte=DRIVER_OK cmd_age=0s
[41080.386269] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 5b 1c f0 38 00 08 00 00
[41080.386271] blk_update_request: I/O error, dev sdb, sector 1528623160 op 0x0: (READ) flags 0x80700 phys_seg 153 prio class 0


Thanks, the latest
[41080.386271] blk_update_request: I/O error, dev sdb, sector 1528623160 op 0x0: (READ) flags 0x80700 phys_seg 153 prio class 0
is usually what shows up in shell repeatedly before the external drive goes offline. Would using a powered USB hub work? the drive has been working well before when used as a passthrough Harddrive for a Windows Server VM for Backups.
When used as a datastore for PBS is when it starts showing errors. it's formatted in EXT4, I was getting ext4-fs errors before also and have reformatted and partitioned a few times too.
Thank you so much, any insight and help is much appreciated.
 
cat /var/log/syslog shows

Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backu^Z
[2]+ Stopped cat /var/log/syslog

and also

Jun 19 13:39:18 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:18 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/8722/8722421e7c23ccd3a978d01c02340155ef1fa4c798f61c7df7d4c76d3ae73103"
Jun 19 13:39:19 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:19 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/2a36/2a36d09e8097b261f5b26a0ffcd3f8ca321baea7170e830eb030783978864ff6"
Jun 19 13:39:20 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:20 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/b21e/b21edda328b01a0bc230a9c211c4a07dfbb1f25c12054d465099eabf9dcc7183"
Jun 19 13:39:21 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:21 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/fafe/fafe0470e6ce9b9eb4be607f1f8efdaca15812cae45942508a2c4f5c17faa0da"
Jun 19 13:39:22 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:22 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/9cf9/9cf973cfebb77d63144eb004729ce0397145c5937db544099f66bcc032e0f664"
Jun 19 13:39:23 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:23 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/0266/0266ffbafb0b62c6d6323f7a03bbd55d3b855bb778777082724d770a3dfa5d93"
Jun 19 13:39:24 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:24 pbs002 ^Z
 
is usually what shows up in shell repeatedly before the external drive goes offline. Would using a powered USB hub work? the drive has been working well before when used as a passthrough Harddrive for a Windows Server VM for Backups.
When used as a datastore for PBS is when it starts showing errors. it's formatted in EXT4, I was getting ext4-fs errors before also and have reformatted and partitioned a few times too.
a powered hub could help, but maybe the disk/controller is dying? what does smartctl say ?
 
smartctl -a /dev/sdb looks good.

Code:
root@pbs002:~# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.35-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Mobile HDD
Device Model:     ST2000LM007-1R8174
Serial Number:    WDZEZ5YG
LU WWN Device Id: 5 000c50 0ba8329a1
Firmware Version: SBK2
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jun 22 08:29:11 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 338) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   006    Pre-fail  Always       -       195292564
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       637
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   082   060   045    Pre-fail  Always       -       170245646
  9 Power_On_Hours          0x0032   064   064   000    Old_age   Always       -       31780 (163 47 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       51
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   088   000    Old_age   Always       -       8590069325
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   051   040    Old_age   Always       -       31 (Min/Max 25/45)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       17375
194 Temperature_Celsius     0x0022   031   049   000    Old_age   Always       -       31 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       801 (82 82 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       34487189112
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       97340325666
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     31633         -
# 2  Short offline       Completed without error       00%     31626         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
i mean there is:

191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 1
but probably not the reason of the error

maybe it's the usb enclosure?
 
This is the smartctl -a /dev/sdb output of another almost identical drive on another PBS Server with USB passthrough. Same setup.
it shows
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 3

Thank you @dcsapak I really appreciate your support. I tried going for a few weeks, I've tried reformatting the drive. Noticed one difference prior that it was formatted as MBR, so I redid it and tried GPT partition instead. Thought it fixed, but before 24 hours was up I would get an error. Right now at least I'll reboot the PBS once before backup time and then sync it with another offsite PBS so I can get the latest snapshots in case the backup drive dies.
I do have another USB drive attached to the server being passed through Windows Server. I will try to reformat the troubled drive with NTFS and copy the contents from Windows Server Drive, format as ext4 and try that for backup. I forget if they're attached through the same hub , but that could rule out the HDD enclosure....

Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.35-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Mobile HDD
Device Model:     ST2000LM007-1R8174
Serial Number:    ZDZ3QXWF
LU WWN Device Id: 5 000c50 0b241e340
Firmware Version: SBK2
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Jun 22 09:42:56 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 322) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                          FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   064   006    Pre-fail  Always       -                                 137108641
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -                                 0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -                                 161
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -                                 0
  7 Seek_Error_Rate         0x000f   079   060   045    Pre-fail  Always       -                                 85218042
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -                                 23153 (175 94 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -                                 0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -                                 60
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -                                 0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -                                 0
188 Command_Timeout         0x0032   100   088   000    Old_age   Always       -                                 6210768284903
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -                                 0
190 Airflow_Temperature_Cel 0x0022   075   051   040    Old_age   Always       -                                 25 (Min/Max 19/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -                                 3
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -                                 6
193 Load_Cycle_Count        0x0032   032   032   000    Old_age   Always       -                                 137436
194 Temperature_Celsius     0x0022   025   049   000    Old_age   Always       -                                 25 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -                                 0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -                                 0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -                                 0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -                                 1646 (64 43 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -                                 15114079561
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -                                 98435371619
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -                                 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                          _of_first_error
# 1  Extended offline    Completed without error       00%     23024         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:
After approximately two days with this it now shows this under

cat /var/log/syslog

Code:
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-ba^Z


and under the PBS VM the console shows:

EXT4-fs error (device sdb1): __ext4_find_entry:1612 inode #20709378: comm tokio-runtime-w: reading directory lblock 0

and this message repeats.

after rebooting, it shows
Aborting journal on device sdb-1-8.
Buffer I/O error on dev sdb1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for sdb1-8.
 
Last edited:
so that was with a different usb drive? or with the original ?
 
All these errors are with the original drive with issued. The only different one is the posted with

191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 3
That is another see we setup which is working well with no issues.
 
Okay so now I have added a Powered USB Hub but I'm getting a different error.
In the past. I had two USB ports on the server.
1st USB port was used up by a larger 4TB -2.5" external drive.
2nd USB port was to an unpowered hub with a 2TB -2.5" External Drive and an APC Backups-1500.

The powered HUB and putting both disks on the hub is the change from before.

I changed the hub to a powered USB hub and put both drives on the hub. Put the APC Backups-1500 directly into the server.
When trying to use my proxmox I keep getting an error
tag#0 timing out command, waited 180s three times,
now it shows [sdb] tag#0 timing out command, waited 60s.
Fdisk -l does now show the USB drive which used to be /dev/sdb
however, lsusb does show the device under that system.