Hello,
One particular VM on one of our nodes has suddenly begun failing to backup as of yesterday. The VM runs and functions perfectly fine, and works after a full shut down and restart. The error in question is the all famous "job failed with err -5 - Input/output error".
There is not a shortage of space at the destination storage. Backing up to another destination also fails. The destination "/backup-shared/" is an SSHFS mount to backup storage on another server, a remount has been attempted already.
Thanks,
Adam
---- Command outputs ----
Backup job output while the VM is fully off:
`qm config 101` output:
`pveversion -v` output:
`zpool list -v && zpool status && zfs list` output:
One particular VM on one of our nodes has suddenly begun failing to backup as of yesterday. The VM runs and functions perfectly fine, and works after a full shut down and restart. The error in question is the all famous "job failed with err -5 - Input/output error".
There is not a shortage of space at the destination storage. Backing up to another destination also fails. The destination "/backup-shared/" is an SSHFS mount to backup storage on another server, a remount has been attempted already.
- Volume "scsi0" is the VM's primary disk stored within a ZFS pool.
- Volume "scsi1" is the VM's swap disk stored on a local NVMe drive.
- When removing "scsi1" and "tpmstate0" from the VM the backup process still fails, so the issue is going to be with "scsi0".
Thanks,
Adam
---- Command outputs ----
Backup job output while the VM is fully off:
Code:
INFO: starting new backup job: vzdump 101 --remove 0 --mode snapshot --storage backup-shared --notification-mode auto --notes-template '{{guestname}}' --node dev-server1 --compress zstd
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2024-09-05 11:58:51
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: virtdev-example
INFO: include disk 'scsi0' 'dev-server1:vm-101-disk-0' 128G
INFO: include disk 'scsi1' 'local-lvm:vm-101-disk-0' 8G
INFO: include disk 'tpmstate0' 'dev-server1:vm-101-disk-1' 4M
INFO: creating vzdump archive '/backup-shared/dump/vzdump-qemu-101-2024_09_05-11_58_51.vma.zst'
INFO: starting kvm to execute backup task
swtpm_setup: Not overwriting existing state file.
INFO: attaching TPM drive to QEMU for backup
INFO: started backup task '1f8849ef-73f1-49f2-8f5f-154fe5dadfcc'
INFO: 6% (8.3 GiB of 136.0 GiB) in 3s, read: 2.8 GiB/s, write: 47.5 MiB/s
INFO: 6% (8.7 GiB of 136.0 GiB) in 6s, read: 145.0 MiB/s, write: 114.4 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 101 failed - job failed with err -5 - Input/output error
INFO: Failed at 2024-09-05 11:58:59
INFO: Backup job finished with errors
INFO: notified via target `mail-to-root`
TASK ERROR: job errors
`qm config 101` output:
Code:
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;sata1;net0
cores: 4
memory: 4096
meta: creation-qemu=6.2.0,ctime=1660567839
name: virtdev-example
net0: virtio=08:00:27:51:79:ef,bridge=vmbr0,firewall=1,queues=4,tag=10
numa: 0
onboot: 1
ostype: l26
sata1: none,media=cdrom
scsi0: dev-server1:vm-101-disk-0,discard=on,iothread=1,size=128G,ssd=1
scsi1: local-lvm:vm-101-disk-0,discard=on,iothread=1,size=8G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=50bd1888-004d-4503-8799-971596e12eea
sockets: 1
tpmstate0: dev-server1:vm-101-disk-1,size=4M,version=v2.0
vmgenid: 4527ccc2-2da4-42b5-bbbc-50b1e4e1ad8a
`pveversion -v` output:
Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-11
proxmox-kernel-6.8: 6.8.12-1
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
pve-kernel-5.15.143-1-pve: 5.15.143-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.2
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.13-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.2-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
`zpool list -v && zpool status && zfs list` output:
Code:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
dev-server1 1.81T 698G 1.13T - - 14% 37% 1.00x ONLINE -
raidz2-0 1.81T 698G 1.13T - - 14% 37.6% - ONLINE
ata-CT500MX500SSD1_2210E615D037 466G - - - - - - - ONLINE
ata-CT500MX500SSD1_2210E6166B1B 466G - - - - - - - ONLINE
ata-CT500MX500SSD1_2210E6166B0D 466G - - - - - - - ONLINE
ata-CT500MX500SSD1_2210E616684E 466G - - - - - - - ONLINE
pool: dev-server1
state: ONLINE
scan: scrub repaired 0B in 00:10:16 with 0 errors on Wed Sep 4 20:10:17 2024
config:
NAME STATE READ WRITE CKSUM
dev-server1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-CT500MX500SSD1_2210E615D037 ONLINE 0 0 0
ata-CT500MX500SSD1_2210E6166B1B ONLINE 0 0 0
ata-CT500MX500SSD1_2210E6166B0D ONLINE 0 0 0
ata-CT500MX500SSD1_2210E616684E ONLINE 0 0 0
errors: No known data errors
NAME USED AVAIL REFER MOUNTPOINT
dev-server1 567G 304G 256K /dev-server1
dev-server1/vm-100-disk-0 142G 430G 16.4G -
dev-server1/vm-100-disk-1 6.36M 304G 134K -
dev-server1/vm-101-disk-0 142G 321G 125G -
dev-server1/vm-101-disk-1 6.36M 304G 128K -
dev-server1/vm-102-disk-0 142G 366G 80.0G -
dev-server1/vm-102-disk-1 6.36M 304G 134K -
dev-server1/vm-104-disk-0 142G 330G 116G -
dev-server1/vm-104-disk-1 6.36M 304G 157K -