Certain VM's Failing to Backup: I/O Error 5

LBX_Blackjack

Member
Jul 24, 2020
35
1
8
31
Hello all,
I have four nodes backing up to PBS on a regular basis. One of the nodes has five VM's, only one of which is backing up. The other four fail. All other VM's on all other nodes back up without issue. I really have no idea what could be causing it, but it's important that I fix this since these are some of our most important VM's. I'm adding an excerpt from my PBS syslog that looks to highlight the error, if someone could help me make sense of it I'd be very appreciative.

Code:
Nov  3 00:01:54 pbs proxmox-backup-api[734]: successful auth for user 'root@pam'
Nov  3 00:02:00 pbs proxmox-backup-proxy[816]: register worker thread 
Nov  3 00:02:00 pbs proxmox-backup-proxy[816]: register worker   
Nov  3 00:02:00 pbs proxmox-backup-proxy[816]: FILE: "/var/log/proxmox-backup/tasks/E4/UPID:pbs:00000330:0000BCE4:00000013:5FA0E448:logrotate:task-archive:backup@pam:"         
Nov  3 00:02:02 pbs proxmox-backup-api[734]: successful auth for user 'root@pam'     
Nov  3 00:02:03 pbs proxmox-backup-api[734]: successful auth for user 'root@pam'   
Nov  3 00:02:03 pbs kernel: [54457.959016] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0   
Nov  3 00:02:03 pbs kernel: [54457.960038] ata7.00: irq_stat 0x40000001     
Nov  3 00:02:03 pbs kernel: [54457.961002] ata7.00: failed command: READ DMA EXT     
Nov  3 00:02:03 pbs kernel: [54457.961964] ata7.00: cmd 25/##:##:##:##:##/00:00:60:00:00/e0 tag 26 dma 4096 in     
Nov  3 00:02:03 pbs kernel: [54457.961964]          res 51/##:##:##:##:##/00:00:60:00:00/00 Emask 0x9 (media error)     
Nov  3 00:02:03 pbs kernel: [54457.963907] ata7.00: status: { DRDY ERR } 
Nov  3 00:02:03 pbs kernel: [54457.964903] ata7.00: error: { UNC }   
Nov  3 00:02:03 pbs proxmox-backup-proxy[816]: unable to start task log rotation: Input/output error (os error 5)     
Nov  3 00:02:03 pbs kernel: [54457.976437] ata7.00: configured for UDMA/133     
Nov  3 00:02:03 pbs kernel: [54457.976454] sd 6:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
Nov  3 00:02:03 pbs kernel: [54457.976456] sd 6:0:0:0: [sdb] tag#26 Sense Key : Medium Error [current]                             
Nov  3 00:02:03 pbs kernel: [54457.976458] sd 6:0:0:0: [sdb] tag#26 Add. Sense: Unrecovered read error - auto reallocate failed     
Nov  3 00:02:03 pbs kernel: [54457.976462] sd 6:0:0:0: [sdb] tag#26 CDB: Read(10) 28 00 60 14 47 b0 00 00 08 00     
Nov  3 00:02:03 pbs kernel: [54457.976464] blk_update_request: I/O error, dev sdb, sector 1611941808 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0       
Nov  3 00:02:03 pbs kernel: [54457.977971] ata7: EH complete
 
hi,

i'd guess it's a disk error. you can try checking your disk health using smartctl command
 
SMART reports the disk as healthy. Right now the same drive is used for all of the VM backups and only those four that I mentioned are failing.
 
could you try with the verbose option and paste the output here?

these are clearly hardware errors from your disk:
Code:
Nov  3 00:02:03 pbs kernel: [54457.959016] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0   
Nov  3 00:02:03 pbs kernel: [54457.960038] ata7.00: irq_stat 0x40000001     
Nov  3 00:02:03 pbs kernel: [54457.961002] ata7.00: failed command: READ DMA EXT     
Nov  3 00:02:03 pbs kernel: [54457.961964] ata7.00: cmd 25/##:##:##:##:##/00:00:60:00:00/e0 tag 26 dma 4096 in     
Nov  3 00:02:03 pbs kernel: [54457.961964]          res 51/##:##:##:##:##/00:00:60:00:00/00 Emask 0x9 (media error)     
Nov  3 00:02:03 pbs kernel: [54457.963907] ata7.00: status: { DRDY ERR } 
Nov  3 00:02:03 pbs kernel: [54457.964903] ata7.00: error: { UNC }   
Nov  3 00:02:03 pbs proxmox-backup-proxy[816]: unable to start task log rotation: Input/output error (os error 5)     
Nov  3 00:02:03 pbs kernel: [54457.976437] ata7.00: configured for UDMA/133     
Nov  3 00:02:03 pbs kernel: [54457.976454] sd 6:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
Nov  3 00:02:03 pbs kernel: [54457.976456] sd 6:0:0:0: [sdb] tag#26 Sense Key : Medium Error [current]                             
Nov  3 00:02:03 pbs kernel: [54457.976458] sd 6:0:0:0: [sdb] tag#26 Add. Sense: Unrecovered read error - auto reallocate failed     
Nov  3 00:02:03 pbs kernel: [54457.976462] sd 6:0:0:0: [sdb] tag#26 CDB: Read(10) 28 00 60 14 47 b0 00 00 08 00

unrecovered read error usually means a sector can't be read properly.
 
You were right. The drive wasn't even showing up so I ran the SMART check on the wrong one. Ran it correctly and got a mess of errors. Thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!