[SOLVED] ERROR: job failed with err -5 - Input/output error

LooneyTunes · Apr 20, 2023

Hi,

Happy to finally have got PBS and PVE working together, I immediately found that I get an error when trying to backup...

Code:

INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO:   1% (524.0 MiB of 32.0 GiB) in 3s, read: 174.7 MiB/s, write: 136.0 MiB/s
INFO:   2% (924.0 MiB of 32.0 GiB) in 6s, read: 133.3 MiB/s, write: 101.3 MiB/s
INFO:   4% (1.4 GiB of 32.0 GiB) in 9s, read: 186.7 MiB/s, write: 80.0 MiB/s
INFO:   6% (2.0 GiB of 32.0 GiB) in 12s, read: 172.0 MiB/s, write: 142.7 MiB/s
INFO:   7% (2.3 GiB of 32.0 GiB) in 15s, read: 118.7 MiB/s, write: 118.7 MiB/s
INFO:   7% (2.5 GiB of 32.0 GiB) in 17s, read: 86.0 MiB/s, write: 86.0 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again

The log contained this... Does not look very good...

Code:

Apr 20 17:18:05 pve pvedaemon[9730]: INFO: starting new backup job: vzdump 102 --remove 0 --node pve --storage PBS --mode snapshot --notes-template '{{guestname}}'
Apr 20 17:18:05 pve pvedaemon[9730]: INFO: Starting Backup of VM 102 (qemu)
Apr 20 17:18:23 pve kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x40000 action 0x0
Apr 20 17:18:23 pve kernel: ata1.00: irq_stat 0x40000008
Apr 20 17:18:23 pve kernel: ata1: SError: { CommWake }
Apr 20 17:18:23 pve kernel: ata1.00: failed command: READ FPDMA QUEUED
Apr 20 17:18:23 pve kernel: ata1.00: cmd 60/80:58:00:84:0a/01:00:1b:00:00/40 tag 11 ncq dma 196608 in         res 41/40:80:30:84:0a/00:01:1b:00:00/00 Emask 0x409 (media error) <F>
Apr 20 17:18:23 pve kernel: ata1.00: status: { DRDY ERR }
Apr 20 17:18:23 pve kernel: ata1.00: error: { UNC }
Apr 20 17:18:23 pve kernel: ata1.00: supports DRM functions and may not be fully accessible
Apr 20 17:18:23 pve kernel: ata1.00: supports DRM functions and may not be fully accessible
Apr 20 17:18:23 pve kernel: ata1.00: configured for UDMA/133
Apr 20 17:18:23 pve kernel: sd 0:0:0:0: [sda] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Apr 20 17:18:23 pve kernel: sd 0:0:0:0: [sda] tag#11 Sense Key : Medium Error [current]
Apr 20 17:18:23 pve kernel: sd 0:0:0:0: [sda] tag#11 Add. Sense: Unrecovered read error - auto reallocate failed
Apr 20 17:18:23 pve kernel: sd 0:0:0:0: [sda] tag#11 CDB: Read(10) 28 00 1b 0a 84 00 00 01 80 00
Apr 20 17:18:23 pve kernel: blk_update_request: I/O error, dev sda, sector 453674032 op 0x0:(READ) flags 0x0 phys_seg 21 prio class 0
Apr 20 17:18:23 pve kernel: ata1: EH complete
Apr 20 17:18:23 pve kernel: ata1.00: Enabling discard_zeroes_data
Apr 20 17:18:24 pve pvedaemon[9730]: ERROR: Backup of VM 102 failed - job failed with err -5 - Input/output error
Apr 20 17:18:24 pve pvedaemon[9730]: INFO: Backup job finished with errors
Apr 20 17:18:24 pve pvedaemon[9730]: job errors
Apr 20 17:18:24 pve pvedaemon[971]: <root@pam> end task UPID:pve:00002602:0004C30C:644157AD:vzdump:102:root@pam: job errors

But when I run S.M.A.R.T. tests on the NAS itself it does not seem bad at all, infact, it is very happy. What...?

What else can I check? I've heard about misaligned drives, but would be surprised. This has been installed and working like this before...

Edit: Running an extended S.M.A.R.T too now. Thinking if a simple format of them would do any good...? Just read about internal disk cables gone bad, will try that next.

Thanks

Nuke Bloodaxe · Apr 20, 2023

Could you tell us more about the host hardware on the origin system? [This is for people investigating later.]
Also, other than PBS, which you're trying to backup the VM to, do you have a local backup or backup copy somewhere else of the VM concerned?
[Or, at least, a backup of the data in the VM. I'd recommend making sure you have copied anything you need now if you haven't already.]

LooneyTunes · Apr 20, 2023

Nuke Bloodaxe said:
Could you tell us more about the host hardware on the origin system? [This is for people investigating later.]
Also, other than PBS, which you're trying to backup the VM to, do you have a local backup or backup copy somewhere else of the VM concerned?
[Or, at least, a backup of the data in the VM. I'd recommend making sure you have copied anything you need now if you haven't already.]

Hi,
Well, I was so far assuming this error was from the destination...

Then I'll agree it it time for a VM backup, I have a manual setup for that already. Thanks for the heads-up on that one.

Host hardware of my PVE? Scary... Perhaps I should run a SMART on that as well... It runs a SSD which has been running for a while 24/7, but not all that old...

LooneyTunes · Apr 20, 2023

Now I'm starting to loose it... Most of my VM's cannot be backed up, even to a locally attached USB... They exit with the same i/o error...

Now it seems clear (haven't come to the SMART test yet) that it indeed is at least partly a source problem... Can I repair a VM? None of them are irreplacable, but it would be no fun to redo them... Weird thing is that they work fine, and show no issues starting up...

The only one that I could backup, has been shut down for quite some time... I wonder how they can have been corrupted... An upgrade perhaps? I have a UPS that should safeguard from sudden power loss, but who knows... As they do work it feels strange...

Edit: Googling I thought of trying fsck... but

Code:

root@pve:/# fsck -fn /dev/sda
fsck from util-linux 2.36.1
e2fsck 1.46.5 (30-Dec-2021)
Warning!  /dev/sda is in use.
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sda

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Found a gpt partition table in /dev/sda
root@pve:/#

Would this be where I boot it from a live-iso/USB and run fsck from that? or what would be the better choice?

LooneyTunes · Apr 20, 2023

Ok, it is officially toast won't boot after a

Code:

lvchange -ay /dev/pve/root && fsck /dev/pve/root

operation from a boot-iso

Can I mount this drive to another install and try some more recovery? Copying out the VM's would be the one thing I need

What would be the correct way of mounting it using USB please?

LooneyTunes · Apr 20, 2023

I suppose this was one of the harder issues as no one responded. I am a bit stressed, and have probably lost most of it... I have older backups so not back to scratch luckily, but close.

Don't forget your backups guys! I did...

Nuke Bloodaxe · Apr 21, 2023

Sorry about the late reply, I work quite long hours [was only up at that time due to phenomenally bad insomnia]. Okay, not quite the worst has happened, there are options.
Let's try something extreme. You could clone the drive, ignoring errors and sector by sector, to a backup file on external storage.
Then you could restore the backup to a new drive of the same size; this would give you something to work with, you can also attempt to mount a copy of that backup. The thing is, you can destroy and experiment with copies as much as you like, and as long as you retain the original backup file, you have something to play with; this could potentially be a long project, but worth doing, as you'll learn a lot along the way.

Alternatively, you can remove the drive, put a new drive in and set it all up, then find some way of mounting the old drive read-only and simply attempting to copy the data across manually; again, getting the system concerned to ignore the errors. Little dangerous though.

The main thing is this, you want to do all the work on a clone, and you do have the option of recreating everything while having that on the back-burner.

It's very unfortunate that it ran into trouble as you were setting up the backup system. As a recommendation, it's sometimes better to look at data replication first in the case of a faulty drive, rather than working on the drive itself.

Note: I had severe problems this end with a ZFS mirror that would die spectacularly. I concentrated on exporting the data via rescue read-only mode. this gave me something to go back to while I worked out why the array was failing. Eventually tracked it down to RAM instability [RAM which kept passing memtest] and I also shoved a read cache and zil device into the mix. More stable higher-capacity RAM and the extra device have made the setup very safe.

Really, we need someone with experience of working on drives with bad superblocks.

LooneyTunes · Apr 29, 2023

Marking this as solved as I found out the errror in my case was due to corrupt VM's. Why was never condluded though.

[SOLVED] ERROR: job failed with err -5 - Input/output error

LooneyTunes

Active Member

Nuke Bloodaxe

Active Member

LooneyTunes

Active Member

LooneyTunes

Active Member

LooneyTunes

Active Member

LooneyTunes

Active Member

Nuke Bloodaxe

Active Member

LooneyTunes

Active Member

We value your privacy