PBS 2.2-3 PBS EIO I/O Error

Jarvar

Active Member
Aug 27, 2019
317
10
38
Jun 20 07:31:21 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:31:21 pbs002 prox^Z

repeated error showing in /var/log/syslog
Also in the shell with a USB External Drive, eventually get blk_updadate_request I/O Error a few times and the drive does not show connected.
I have 3 different PBS setup the same way without error.
Some posts have indicated this could be a cable, hardware, drive failure issue?
Anybody had any experience with this?
Thank you.
 
yes, an I/O Error is usually a hardware problem (disk/cable/controller/etc) you can check the output of 'dmesg' which could contain further info
 
@dcsapak this is the part of the dmesg command, let me know how far back I should post to find useful information.

se [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0
[ 3.843622] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
[ 3.846770] Console: switching to colour dummy device 80x25
[ 3.846993] [drm] Found bochs VGA, ID 0xb0c0.
[ 3.846996] [drm] Framebuffer size 16384 kB @ 0xfc000000, mmio @ 0xfea50000.
[ 3.848551] [drm] Found EDID data blob.
[ 3.849512] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on mi nor 0
[ 3.851191] fbcon: bochs-drmdrmfb (fb0) is primary device
[ 3.888535] Console: switching to colour frame buffer device 128x48
[ 3.890344] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer de vice
[ 3.939593] ZFS: Loaded module v2.1.4-pve1, ZFS pool version 5000, ZFS filesy stem version 5
[ 4.158142] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 4.382499] random: dbus-daemon: uninitialized urandom read (12 bytes read)
[ 8.880985] random: crng init done
[41080.364835] usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd
[41080.386257] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverb yte=DRIVER_OK cmd_age=0s
[41080.386269] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 5b 1c f0 38 00 08 00 00
[41080.386271] blk_update_request: I/O error, dev sdb, sector 1528623160 op 0x0: (READ) flags 0x80700 phys_seg 153 prio class 0


Thanks, the latest
[41080.386271] blk_update_request: I/O error, dev sdb, sector 1528623160 op 0x0: (READ) flags 0x80700 phys_seg 153 prio class 0
is usually what shows up in shell repeatedly before the external drive goes offline. Would using a powered USB hub work? the drive has been working well before when used as a passthrough Harddrive for a Windows Server VM for Backups.
When used as a datastore for PBS is when it starts showing errors. it's formatted in EXT4, I was getting ext4-fs errors before also and have reformatted and partitioned a few times too.
Thank you so much, any insight and help is much appreciated.
 
cat /var/log/syslog shows

Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:30:31 pbs002 proxmox-backu^Z
[2]+ Stopped cat /var/log/syslog

and also

Jun 19 13:39:18 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:18 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/8722/8722421e7c23ccd3a978d01c02340155ef1fa4c798f61c7df7d4c76d3ae73103"
Jun 19 13:39:19 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:19 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/2a36/2a36d09e8097b261f5b26a0ffcd3f8ca321baea7170e830eb030783978864ff6"
Jun 19 13:39:20 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:20 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/b21e/b21edda328b01a0bc230a9c211c4a07dfbb1f25c12054d465099eabf9dcc7183"
Jun 19 13:39:21 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:21 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/fafe/fafe0470e6ce9b9eb4be607f1f8efdaca15812cae45942508a2c4f5c17faa0da"
Jun 19 13:39:22 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:22 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/9cf9/9cf973cfebb77d63144eb004729ce0397145c5937db544099f66bcc032e0f664"
Jun 19 13:39:23 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:23 pbs002 proxmox-backup-proxy[569]: download chunk "/mnt/usb-drive01/disk02/store02/.chunks/0266/0266ffbafb0b62c6d6323f7a03bbd55d3b855bb778777082724d770a3dfa5d93"
Jun 19 13:39:24 pbs002 proxmox-backup-proxy[569]: GET /chunk
Jun 19 13:39:24 pbs002 ^Z
 
is usually what shows up in shell repeatedly before the external drive goes offline. Would using a powered USB hub work? the drive has been working well before when used as a passthrough Harddrive for a Windows Server VM for Backups.
When used as a datastore for PBS is when it starts showing errors. it's formatted in EXT4, I was getting ext4-fs errors before also and have reformatted and partitioned a few times too.
a powered hub could help, but maybe the disk/controller is dying? what does smartctl say ?
 
smartctl -a /dev/sdb looks good.

Code:
root@pbs002:~# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.35-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Mobile HDD
Device Model:     ST2000LM007-1R8174
Serial Number:    WDZEZ5YG
LU WWN Device Id: 5 000c50 0ba8329a1
Firmware Version: SBK2
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jun 22 08:29:11 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 338) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   006    Pre-fail  Always       -       195292564
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       637
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   082   060   045    Pre-fail  Always       -       170245646
  9 Power_On_Hours          0x0032   064   064   000    Old_age   Always       -       31780 (163 47 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       51
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   088   000    Old_age   Always       -       8590069325
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   051   040    Old_age   Always       -       31 (Min/Max 25/45)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       17375
194 Temperature_Celsius     0x0022   031   049   000    Old_age   Always       -       31 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       801 (82 82 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       34487189112
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       97340325666
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     31633         -
# 2  Short offline       Completed without error       00%     31626         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
i mean there is:

191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 1
but probably not the reason of the error

maybe it's the usb enclosure?
 
This is the smartctl -a /dev/sdb output of another almost identical drive on another PBS Server with USB passthrough. Same setup.
it shows
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 3

Thank you @dcsapak I really appreciate your support. I tried going for a few weeks, I've tried reformatting the drive. Noticed one difference prior that it was formatted as MBR, so I redid it and tried GPT partition instead. Thought it fixed, but before 24 hours was up I would get an error. Right now at least I'll reboot the PBS once before backup time and then sync it with another offsite PBS so I can get the latest snapshots in case the backup drive dies.
I do have another USB drive attached to the server being passed through Windows Server. I will try to reformat the troubled drive with NTFS and copy the contents from Windows Server Drive, format as ext4 and try that for backup. I forget if they're attached through the same hub , but that could rule out the HDD enclosure....

Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.35-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Mobile HDD
Device Model:     ST2000LM007-1R8174
Serial Number:    ZDZ3QXWF
LU WWN Device Id: 5 000c50 0b241e340
Firmware Version: SBK2
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Jun 22 09:42:56 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 322) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                          FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   064   006    Pre-fail  Always       -                                 137108641
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -                                 0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -                                 161
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -                                 0
  7 Seek_Error_Rate         0x000f   079   060   045    Pre-fail  Always       -                                 85218042
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -                                 23153 (175 94 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -                                 0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -                                 60
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -                                 0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -                                 0
188 Command_Timeout         0x0032   100   088   000    Old_age   Always       -                                 6210768284903
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -                                 0
190 Airflow_Temperature_Cel 0x0022   075   051   040    Old_age   Always       -                                 25 (Min/Max 19/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -                                 3
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -                                 6
193 Load_Cycle_Count        0x0032   032   032   000    Old_age   Always       -                                 137436
194 Temperature_Celsius     0x0022   025   049   000    Old_age   Always       -                                 25 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -                                 0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -                                 0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -                                 0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -                                 1646 (64 43 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -                                 15114079561
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -                                 98435371619
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -                                 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                          _of_first_error
# 1  Extended offline    Completed without error       00%     23024         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:
After approximately two days with this it now shows this under

cat /var/log/syslog

Code:
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-backup-proxy[569]: list groups error on datastore store02 - EIO: I/O error
Jun 20 07:47:12 pbs002 proxmox-ba^Z


and under the PBS VM the console shows:

EXT4-fs error (device sdb1): __ext4_find_entry:1612 inode #20709378: comm tokio-runtime-w: reading directory lblock 0

and this message repeats.

after rebooting, it shows
Aborting journal on device sdb-1-8.
Buffer I/O error on dev sdb1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for sdb1-8.
 
Last edited:
so that was with a different usb drive? or with the original ?
 
All these errors are with the original drive with issued. The only different one is the posted with

191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 3
That is another see we setup which is working well with no issues.
 
Okay so now I have added a Powered USB Hub but I'm getting a different error.
In the past. I had two USB ports on the server.
1st USB port was used up by a larger 4TB -2.5" external drive.
2nd USB port was to an unpowered hub with a 2TB -2.5" External Drive and an APC Backups-1500.

The powered HUB and putting both disks on the hub is the change from before.

I changed the hub to a powered USB hub and put both drives on the hub. Put the APC Backups-1500 directly into the server.
When trying to use my proxmox I keep getting an error
tag#0 timing out command, waited 180s three times,
now it shows [sdb] tag#0 timing out command, waited 60s.
Fdisk -l does now show the USB drive which used to be /dev/sdb
however, lsusb does show the device under that system.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!