[SOLVED] ERROR: job failed with err -121 - Remote I/O error

Banaan

New Member
Mar 12, 2024
5
0
1
We have a Proxmox VM which refuses to be backed up :(

Code:
INFO: creating Proxmox Backup Server archive 'vm/111/2024-03-18T11:58:57Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'bfc8e56d-b81d-4224-8ebe-0e27e8b8f3f6'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: created new
INFO:   0% (296.0 MiB of 30.0 GiB) in 3s, read: 98.7 MiB/s, write: 58.7 MiB/s
INFO:   1% (532.0 MiB of 30.0 GiB) in 6s, read: 78.7 MiB/s, write: 74.7 MiB/s
INFO:   2% (792.0 MiB of 30.0 GiB) in 9s, read: 86.7 MiB/s, write: 58.7 MiB/s
INFO:   3% (1.1 GiB of 30.0 GiB) in 12s, read: 100.0 MiB/s, write: 68.0 MiB/s
INFO:   4% (1.4 GiB of 30.0 GiB) in 15s, read: 105.3 MiB/s, write: 68.0 MiB/s
INFO:   5% (1.7 GiB of 30.0 GiB) in 18s, read: 100.0 MiB/s, write: 62.7 MiB/s
INFO:   6% (1.9 GiB of 30.0 GiB) in 21s, read: 89.3 MiB/s, write: 78.7 MiB/s
INFO:   7% (2.1 GiB of 30.0 GiB) in 24s, read: 61.3 MiB/s, write: 50.7 MiB/s
INFO:   8% (2.4 GiB of 30.0 GiB) in 27s, read: 102.7 MiB/s, write: 60.0 MiB/s
INFO:   9% (2.7 GiB of 30.0 GiB) in 31s, read: 78.0 MiB/s, write: 37.0 MiB/s
INFO:  10% (3.1 GiB of 30.0 GiB) in 35s, read: 89.0 MiB/s, write: 68.0 MiB/s
INFO:  11% (3.3 GiB of 30.0 GiB) in 38s, read: 88.0 MiB/s, write: 69.3 MiB/s
INFO:  12% (3.7 GiB of 30.0 GiB) in 42s, read: 87.0 MiB/s, write: 56.0 MiB/s
INFO:  13% (3.9 GiB of 30.0 GiB) in 45s, read: 97.3 MiB/s, write: 89.3 MiB/s
INFO:  14% (4.3 GiB of 30.0 GiB) in 48s, read: 109.3 MiB/s, write: 76.0 MiB/s
INFO:  15% (4.5 GiB of 30.0 GiB) in 51s, read: 81.3 MiB/s, write: 74.7 MiB/s
INFO:  16% (4.8 GiB of 30.0 GiB) in 54s, read: 106.7 MiB/s, write: 73.3 MiB/s
INFO:  17% (5.2 GiB of 30.0 GiB) in 57s, read: 117.3 MiB/s, write: 50.7 MiB/s
INFO:  18% (5.5 GiB of 30.0 GiB) in 1m, read: 118.7 MiB/s, write: 52.0 MiB/s
INFO:  19% (5.9 GiB of 30.0 GiB) in 1m 3s, read: 129.3 MiB/s, write: 22.7 MiB/s
INFO:  20% (6.2 GiB of 30.0 GiB) in 1m 6s, read: 114.7 MiB/s, write: 65.3 MiB/s
INFO:  21% (6.5 GiB of 30.0 GiB) in 1m 9s, read: 106.7 MiB/s, write: 65.3 MiB/s
INFO:  22% (6.9 GiB of 30.0 GiB) in 1m 12s, read: 108.0 MiB/s, write: 57.3 MiB/s
INFO:  23% (7.2 GiB of 30.0 GiB) in 1m 15s, read: 104.0 MiB/s, write: 68.0 MiB/s
INFO:  24% (7.5 GiB of 30.0 GiB) in 1m 18s, read: 106.7 MiB/s, write: 81.3 MiB/s
INFO:  25% (7.7 GiB of 30.0 GiB) in 1m 21s, read: 73.3 MiB/s, write: 61.3 MiB/s
INFO:  26% (7.8 GiB of 30.0 GiB) in 1m 23s, read: 62.0 MiB/s, write: 52.0 MiB/s
ERROR: job failed with err -121 - Remote I/O error
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 111 failed - job failed with err -121 - Remote I/O error
INFO: Failed at 2024-03-18 13:00:22
INFO: Backup job finished with errors
TASK ERROR: job errors

This happens on both a backup to local storage, as a backup to the Proxmox Backup Server. This led me to assume that the error is on the VM itself and not at the backup locations?

I've checked the Ext4 filesystem by booting into a Gentoo iso and running fsck -fy /dev/sda1, which corrected a few inodes but after a reboot the error still appears at exactly the same spot.

Any clues on what to try next?
 
Hi,

To more troubleshooting into the issue, I would:
- Check the backup on other storage (not local) shared storage or something else.
- Check the syslog during the backup job.
- Only the 111 VM have this issue?
- What say the smartctl of the disk of the VM?
 
Hi,

To more troubleshooting into the issue, I would:
- Check the backup on other storage (not local) shared storage or something else.
- Check the syslog during the backup job.
- Only the 111 VM have this issue?
- What say the smartctl of the disk of the VM?

1. All storage types yield the same error.
2. See below.
3. Yes, this is the only one.
4. All SMART properties are fine, see screenshot below.

This is the content of the syslog, which probably shows why the backup fails:

Code:
Mar 19 15:34:38 pve01 pvedaemon[4045]: <root@pam> starting task UPID:pve01:00037EED:00262D66:65F9A27E:vzdump:111:root@pam:
Mar 19 15:34:38 pve01 pvedaemon[229101]: INFO: starting new backup job: vzdump 111 --node pve01 --storage pbs --mode snapshot --remove 0
Mar 19 15:34:38 pve01 pvedaemon[229101]: INFO: Starting Backup of VM 111 (qemu)
Mar 19 15:34:39 pve01 systemd[1]: Started 111.scope.
Mar 19 15:34:39 pve01 systemd-udevd[229111]: Using default interface naming scheme 'v240'.
Mar 19 15:34:39 pve01 systemd-udevd[229111]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Mar 19 15:34:39 pve01 systemd-udevd[229111]: Could not generate persistent MAC address for tap111i0: No such file or directory
Mar 19 15:34:39 pve01 kernel: [25021.052922] device tap111i0 entered promiscuous mode
Mar 19 15:34:39 pve01 kernel: [25021.061415] vmbr1: port 3(tap111i0) entered blocking state
Mar 19 15:34:39 pve01 kernel: [25021.061416] vmbr1: port 3(tap111i0) entered disabled state
Mar 19 15:34:39 pve01 kernel: [25021.061771] vmbr1: port 3(tap111i0) entered blocking state
Mar 19 15:34:39 pve01 kernel: [25021.061773] vmbr1: port 3(tap111i0) entered forwarding state
Mar 19 15:35:00 pve01 systemd[1]: Starting Proxmox VE replication runner...
Mar 19 15:35:00 pve01 systemd[1]: pvesr.service: Succeeded.
Mar 19 15:35:00 pve01 systemd[1]: Started Proxmox VE replication runner.
Mar 19 15:35:01 pve01 CRON[229442]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 19 15:36:00 pve01 systemd[1]: Starting Proxmox VE replication runner...
Mar 19 15:36:00 pve01 systemd[1]: pvesr.service: Succeeded.
Mar 19 15:36:00 pve01 systemd[1]: Started Proxmox VE replication runner.
Mar 19 15:36:04 pve01 kernel: [25105.701177] sd 11:0:0:0: [sdh] Unaligned partial completion (resid=45052, sector_sz=512)
Mar 19 15:36:04 pve01 kernel: [25105.701187] sd 11:0:0:0: [sdh] tag#28 CDB: Read(10) 28 00 09 95 14 00 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.701191] sd 11:0:0:0: [sdh] tag#28 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.701193] sd 11:0:0:0: [sdh] tag#28 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.701196] sd 11:0:0:0: [sdh] tag#28 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.701197] sd 11:0:0:0: [sdh] tag#28 CDB: Read(10) 28 00 09 95 14 00 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.701200] blk_update_request: critical target error, dev sdh, sector 160764928 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702141] sd 11:0:0:0: [sdh] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702143] sd 11:0:0:0: [sdh] tag#3 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702144] sd 11:0:0:0: [sdh] tag#3 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702145] sd 11:0:0:0: [sdh] tag#3 CDB: Read(10) 28 00 09 95 14 80 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702146] blk_update_request: critical target error, dev sdh, sector 160765056 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702180] sd 11:0:0:0: [sdh] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702181] sd 11:0:0:0: [sdh] tag#0 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702182] sd 11:0:0:0: [sdh] tag#0 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702183] sd 11:0:0:0: [sdh] tag#0 CDB: Read(10) 28 00 09 95 15 00 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702183] blk_update_request: critical target error, dev sdh, sector 160765184 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702229] sd 11:0:0:0: [sdh] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702230] sd 11:0:0:0: [sdh] tag#4 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702231] sd 11:0:0:0: [sdh] tag#4 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702231] sd 11:0:0:0: [sdh] tag#4 CDB: Read(10) 28 00 09 95 15 80 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702232] blk_update_request: critical target error, dev sdh, sector 160765312 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702262] sd 11:0:0:0: [sdh] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702263] sd 11:0:0:0: [sdh] tag#5 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702264] sd 11:0:0:0: [sdh] tag#5 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702266] sd 11:0:0:0: [sdh] tag#5 CDB: Read(10) 28 00 09 95 16 00 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702267] blk_update_request: critical target error, dev sdh, sector 160765440 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702297] sd 11:0:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702298] sd 11:0:0:0: [sdh] tag#6 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702299] sd 11:0:0:0: [sdh] tag#6 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702300] sd 11:0:0:0: [sdh] tag#6 CDB: Read(10) 28 00 09 95 16 80 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702300] blk_update_request: critical target error, dev sdh, sector 160765568 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702330] sd 11:0:0:0: [sdh] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702331] sd 11:0:0:0: [sdh] tag#7 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702332] sd 11:0:0:0: [sdh] tag#7 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702332] sd 11:0:0:0: [sdh] tag#7 CDB: Read(10) 28 00 09 95 17 00 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702333] blk_update_request: critical target error, dev sdh, sector 160765696 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702363] sd 11:0:0:0: [sdh] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702364] sd 11:0:0:0: [sdh] tag#8 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702365] sd 11:0:0:0: [sdh] tag#8 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702366] sd 11:0:0:0: [sdh] tag#8 CDB: Read(10) 28 00 09 95 17 80 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702367] blk_update_request: critical target error, dev sdh, sector 160765824 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702396] sd 11:0:0:0: [sdh] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702397] sd 11:0:0:0: [sdh] tag#9 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702398] sd 11:0:0:0: [sdh] tag#9 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702399] sd 11:0:0:0: [sdh] tag#9 CDB: Read(10) 28 00 09 95 18 00 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702400] blk_update_request: critical target error, dev sdh, sector 160765952 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:04 pve01 kernel: [25105.702430] sd 11:0:0:0: [sdh] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:36:04 pve01 kernel: [25105.702431] sd 11:0:0:0: [sdh] tag#10 Sense Key : Hardware Error [current]
Mar 19 15:36:04 pve01 kernel: [25105.702432] sd 11:0:0:0: [sdh] tag#10 Add. Sense: Internal target failure
Mar 19 15:36:04 pve01 kernel: [25105.702433] sd 11:0:0:0: [sdh] tag#10 CDB: Read(10) 28 00 09 95 18 80 00 00 80 00
Mar 19 15:36:04 pve01 kernel: [25105.702434] blk_update_request: critical target error, dev sdh, sector 160766080 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 2
Mar 19 15:36:05 pve01 kernel: [25106.662795] vmbr1: port 3(tap111i0) entered disabled state
Mar 19 15:36:05 pve01 qmeventd[3268]: read: Connection reset by peer
Mar 19 15:36:05 pve01 pvedaemon[4046]: VM 111 qmp command failed - VM 111 not running
Mar 19 15:36:05 pve01 systemd[1]: 111.scope: Succeeded.
Mar 19 15:36:05 pve01 qmeventd[3268]: Starting cleanup for 111
Mar 19 15:36:05 pve01 qmeventd[3268]: trying to acquire lock...
Mar 19 15:36:06 pve01 qmeventd[3268]:  OK
Mar 19 15:36:06 pve01 qmeventd[3268]: Finished cleanup for 111
Mar 19 15:36:06 pve01 pvedaemon[229101]: ERROR: Backup of VM 111 failed - job failed with err -121 - Remote I/O error
Mar 19 15:36:06 pve01 pvedaemon[229101]: INFO: Backup job finished with errors
Mar 19 15:36:06 pve01 pvedaemon[229101]: job errors
Mar 19 15:36:06 pve01 pvedaemon[4045]: <root@pam> end task UPID:pve01:00037EED:00262D66:65F9A27E:vzdump:111:root@pam: job errors

The SMART properties:
pve01-Proxmox-Virtual-Environment.png


I'm now looking into the Unaligned partial completion error.
 
Hi all,

I duplicated and replaced the drive and all errors went away ;) I guess the drive was on it's way out ...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!