[Backup] ERROR: job failed with err -5 - Input/output error

mfmconsulting

Member
Nov 14, 2021
6
0
6
39
Good evening everyone,
first of all thanks to those who can help me!

Below is the error:
Code:
INFO: starting new backup job: vzdump 100 --mode snapshot --remove 0 --compress 0 --node pve --storage bck
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2021-11-14 19:20:59
INFO: status = running
INFO: VM Name: MKCDC01
INFO: include disk 'sata0' 'local-lvm:vm-100-disk-0' 180G
INFO: include disk 'sata1' 'vm:100/vm-100-disk-0.qcow2' 600G
INFO: include disk 'sata2' 'vm:100/vm-100-disk-1.qcow2' 600G
INFO: include disk 'sata3' 'vm:100/vm-100-disk-2.qcow2' 500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating archive '/mnt/md1/dump/vzdump-qemu-100-2021_11_14-19_20_59.vma'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '9a02f7d3-65bb-46c0-a150-cde3f96bb300'
INFO: resuming VM again
INFO: status: 0% (1924136960/2018634629120), sparse 0% (183267328), duration 3, read/write 641/580 MB/s
INFO: status: 1% (20277362688/2018634629120), sparse 0% (1904996352), duration 66, read/write 291/263 MB/s
INFO: status: 1% (31101550592/2018634629120), sparse 0% (2038902784), duration 116, read/write 216/213 MB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
ERROR: Backup of VM 100 failed - job failed with err -5 - Input/output error
INFO: Failed at 2021-11-14 19:23:03
INFO: Backup job finished with errors
TASK ERROR: job errors

In the VM, it is a Windows 2019 DC, I have performed analysis and chkdsk / F on the volumes (NOT SSD), the results are all positive ...

How to fix to re-backup this VM?

Greetings
 
Hi,

ist there any error in the syslog/journalctl during the backup time?
syslog:

Code:
Nov 14 19:20:59 pve pvedaemon[11587]: INFO: starting new backup job: vzdump 100 --mode snapshot --remove 0 --compress 0 --node pve --storage bck
Nov 14 19:20:59 pve pvedaemon[11587]: INFO: Starting Backup of VM 100 (qemu)
Nov 14 19:21:00 pve systemd[1]: Starting Proxmox VE replication runner...
Nov 14 19:21:00 pve systemd[1]: pvesr.service: Succeeded.
Nov 14 19:21:00 pve systemd[1]: Started Proxmox VE replication runner.
Nov 14 19:21:46 pve pvestatd[1379]: auth key pair too old, rotating..
Nov 14 19:22:00 pve systemd[1]: Starting Proxmox VE replication runner...
Nov 14 19:22:00 pve systemd[1]: pvesr.service: Succeeded.
Nov 14 19:22:00 pve systemd[1]: Started Proxmox VE replication runner.
Nov 14 19:22:59 pve kernel: [103971.821809] blk_update_request: critical medium error, dev nvme0n1, sector 315193016 op 0x0:(READ) flags 0x80700 phys_seg 9 prio class 0
Nov 14 19:22:59 pve kernel: [103971.900123] blk_update_request: critical medium error, dev nvme0n1, sector 315193072 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 14 19:22:59 pve kernel: [103971.901681] Buffer I/O error on dev dm-6, logical block 7593038, async page read
Nov 14 19:22:59 pve kernel: [103971.977964] blk_update_request: critical medium error, dev nvme0n1, sector 315193072 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 14 19:22:59 pve kernel: [103971.979497] Buffer I/O error on dev dm-6, logical block 7593038, async page read
Nov 14 19:23:00 pve systemd[1]: Starting Proxmox VE replication runner...
Nov 14 19:23:00 pve systemd[1]: pvesr.service: Succeeded.
Nov 14 19:23:00 pve systemd[1]: Started Proxmox VE replication runner.
Nov 14 19:23:03 pve pvedaemon[11587]: ERROR: Backup of VM 100 failed - job failed with err -5 - Input/output error
Nov 14 19:23:03 pve pvedaemon[11587]: INFO: Backup job finished with errors
Nov 14 19:23:03 pve pvedaemon[11587]: job errors

Code:
@pve:~# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO 1TB
Serial Number:                      S5H9NS0N847156H
Firmware Version:                   2B2QEXE7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            110,246,121,472 [110 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 580143d3b9
Local Time is:                      Mon Nov 15 10:24:01 2021 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.20W       -        -    0  0  0  0        0       0
 1 +     4.30W       -        -    1  1  1  1        0       0
 2 +     2.10W       -        -    2  2  2  2        0       0
 3 -   0.0400W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        53 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    10,308,376 [5.27 TB]
Data Units Written:                 3,063,752 [1.56 TB]
Host Read Commands:                 81,839,893
Host Write Commands:                124,062,265
Controller Busy Time:               1,013
Power Cycles:                       105
Power On Hours:                     3,028
Unsafe Shutdowns:                   97
Media and Data Integrity Errors:    79
Error Information Log Entries:      79
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               53 Celsius
Temperature Sensor 2:               60 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0         79     4  0x0223  0xc502  0x000    315193074     1     -
  1         78     3  0x02a7  0xc502  0x000    315193074     1     -
  2         77     8  0x014c  0x4502  0x000    315193074     1     -
  3         76    18  0x00c2  0xc502  0x000    315193074     1     -
  4         75    16  0x02f0  0xc502  0x000    315193074     1     -
  5         74    15  0x0012  0x4502  0x000    315193074     1     -
  6         73    17  0x0218  0xc502  0x000    315193076     1     -
  7         72     1  0x0090  0xc502  0x000    315193074     1     -
  8         71    17  0x0212  0x4502  0x000    315193074     1     -
  9         70    10  0x022f  0xc502  0x000    350164564     1     -
 10         69    12  0x007f  0xc502  0x000    350164564     1     -
 11         68    10  0x0222  0x4502  0x000    350164564     1     -
 12         67    14  0x0016  0x4502  0x000    267315500     1     -
 13         66    12  0x0064  0xc502  0x000    315193074     1     -
 14         65    11  0x0309  0xc502  0x000    315193074     1     -
 15         64    17  0x0207  0x4502  0x000    315193074     1     -
... (3 entries not shown)

root@pve:~#
 
Last edited:
Thank you for the Syslog and the smartctl!

Code:
Nov 14 19:22:59 pve kernel: [103971.821809] blk_update_request: critical medium error, dev nvme0n1, sector 315193016 op 0x0:(READ) flags 0x80700 phys_seg 9 prio class 0
Nov 14 19:22:59 pve kernel: [103971.900123] blk_update_request: critical medium error, dev nvme0n1, sector 315193072 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 14 19:22:59 pve kernel: [103971.901681] Buffer I/O error on dev dm-6, logical block 7593038, async page read
Nov 14 19:22:59 pve kernel: [103971.977964] blk_update_request: critical medium error, dev nvme0n1, sector 315193072 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 14 19:22:59 pve kernel: [103971.979497] Buffer I/O error on dev dm-6, logical block 7593038, async page read

Looks like the nvme storage has a problem. Is the backup job works on the "local" storage?
 
Thank you for the Syslog and the smartctl!

Code:
Nov 14 19:22:59 pve kernel: [103971.821809] blk_update_request: critical medium error, dev nvme0n1, sector 315193016 op 0x0:(READ) flags 0x80700 phys_seg 9 prio class 0
Nov 14 19:22:59 pve kernel: [103971.900123] blk_update_request: critical medium error, dev nvme0n1, sector 315193072 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 14 19:22:59 pve kernel: [103971.901681] Buffer I/O error on dev dm-6, logical block 7593038, async page read
Nov 14 19:22:59 pve kernel: [103971.977964] blk_update_request: critical medium error, dev nvme0n1, sector 315193072 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 14 19:22:59 pve kernel: [103971.979497] Buffer I/O error on dev dm-6, logical block 7593038, async page read

Looks like the nvme storage has a problem. Is the backup job works on the "local" storage?
local-lvm is nvme where is vm-100-disk-0

bck is on hdd 8Tb (is other storage)

Do you have any idea how to fix the error on "Buffer I/O error on dev dm-6"?
 
Could you please share the VM config and the output of lvs command? you can get the VM config using the qm config {VMID} CLI
 
here:
Code:
root@pve:~# qm config 100
agent: 1
balloon: 4096
bootdisk: sata0
cores: 6
ide2: iso:iso/virtio-win-0.1.185.iso,media=cdrom,size=402812K
memory: 22528
name: MKCDC01
net0: e1000=C6:06:A1:5E:F3:80,bridge=vmbr0
numa: 0
onboot: 1
ostype: win10
parent: TuttoFunzionante
sata0: local-lvm:vm-100-disk-0,cache=writeback,size=180G
sata1: vm:100/vm-100-disk-0.qcow2,cache=writeback,size=600G
sata2: vm:100/vm-100-disk-1.qcow2,cache=writeback,size=600G
sata3: vm:100/vm-100-disk-2.qcow2,cache=writeback,size=500G
scsihw: virtio-scsi-pci
smbios1: uuid=c38afe04-77c1-4947-97ae-4f83a118a76d
sockets: 1
vmgenid: 35f7b7dc-72a6-4094-bbeb-4b4f3ae5ec20
root@pve:~#
Code:
root@pve:~# lvs
  LV                                  VG  Attr       LSize    Pool Origin        Data%  Meta%  Move Log Cpy%Sync Convert
  data                                pve twi-aotz-- <794.79g                    5.24   0.54                           
  root                                pve -wi-ao----   96.00g                                                           
  snap_vm-100-disk-0_TuttoFunzionante pve Vri---tz-k  180.00g data vm-100-disk-0                                       
  swap                                pve -wi-ao----    8.00g                                                           
  vm-100-disk-0                       pve Vwi-aotz--  180.00g data               22.43                                 
root@pve:~#

N.B. ostype is WS 2K19

Schermata da 2021-11-15 11-20-27.png
 
Last edited: