Backup of certain VM failed

Hynek Sura

New Member
Oct 25, 2021
7
0
1
26
While performing a backup of virtual machines I ran into issues with input/output error. Last week I performed an upgrade from proxmox-ve 6 to 7. The storage i am uploading snapshots to are s3 type. Tried to change compression method, but no luck. Also it doesn't fail on certain virtual machines which have smaller disk sizes. But the backup destination has plenty of room. Any ideas?

Code:
pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.4: 6.4-6
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.101-1-pve: 5.4.101-1
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksmtuned: 4.20150326
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

backup.log:
Code:
()
INFO: starting new backup job: vzdump 101 --node fl1-pmx2 --mode snapshot --storage S3 --compress lzo --remove 0
INFO: HOOK: job-start
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2021-10-25 05:37:47
INFO: status = running
INFO: VM Name: fl1-conveyor-controller.example.cloud
INFO: include disk 'scsi0' 'ssd_zfs:vm-101-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: HOOK: backup-start snapshot 101
INFO: HOOK: pre-stop snapshot 101
INFO: HOOK: pre-restart snapshot 101
INFO: HOOK: post-restart snapshot 101
INFO: creating vzdump archive '/mnt/s3_pmx_backup/dump/vzdump-qemu-101-2021_10_25-05_37_46.vma.lzo'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '4b8098b0-6801-4d2b-a440-7d1d7c4266c6'
INFO: resuming VM again
INFO:   3% (1018.6 MiB of 32.0 GiB) in 3s, read: 339.5 MiB/s, write: 294.6 MiB/s
INFO:   6% (1.9 GiB of 32.0 GiB) in 6s, read: 320.2 MiB/s, write: 246.7 MiB/s
INFO:   8% (2.8 GiB of 32.0 GiB) in 9s, read: 306.9 MiB/s, write: 248.8 MiB/s
INFO:  11% (3.6 GiB of 32.0 GiB) in 12s, read: 263.4 MiB/s, write: 227.8 MiB/s
INFO:  14% (4.5 GiB of 32.0 GiB) in 15s, read: 315.4 MiB/s, write: 270.8 MiB/s
INFO:  18% (5.8 GiB of 32.0 GiB) in 18s, read: 437.4 MiB/s, write: 291.6 MiB/s
INFO:  19% (6.4 GiB of 32.0 GiB) in 21s, read: 194.5 MiB/s, write: 145.0 MiB/s
INFO:  24% (7.8 GiB of 32.0 GiB) in 24s, read: 485.8 MiB/s, write: 254.3 MiB/s
INFO:  28% (9.2 GiB of 32.0 GiB) in 27s, read: 467.6 MiB/s, write: 289.8 MiB/s
INFO:  32% (10.3 GiB of 32.0 GiB) in 30s, read: 390.3 MiB/s, write: 317.0 MiB/s
INFO:  35% (11.3 GiB of 32.0 GiB) in 33s, read: 328.7 MiB/s, write: 269.4 MiB/s
INFO:  38% (12.5 GiB of 32.0 GiB) in 36s, read: 403.6 MiB/s, write: 297.4 MiB/s
INFO:  41% (13.4 GiB of 32.0 GiB) in 39s, read: 315.4 MiB/s, write: 226.5 MiB/s
INFO:  42% (13.5 GiB of 32.0 GiB) in 2m 16s, read: 1.5 MiB/s, write: 1.0 MiB/s
INFO:  46% (14.8 GiB of 32.0 GiB) in 2m 19s, read: 429.0 MiB/s, write: 300.6 MiB/s
INFO:  50% (16.3 GiB of 32.0 GiB) in 2m 22s, read: 505.1 MiB/s, write: 279.2 MiB/s
INFO:  56% (18.2 GiB of 32.0 GiB) in 2m 25s, read: 665.8 MiB/s, write: 204.9 MiB/s
INFO:  65% (21.1 GiB of 32.0 GiB) in 2m 28s, read: 988.4 MiB/s, write: 195.1 MiB/s
INFO:  71% (22.8 GiB of 32.0 GiB) in 2m 31s, read: 578.8 MiB/s, write: 252.4 MiB/s
INFO:  73% (23.6 GiB of 32.0 GiB) in 2m 34s, read: 282.8 MiB/s, write: 259.5 MiB/s
INFO:  77% (24.7 GiB of 32.0 GiB) in 2m 37s, read: 364.9 MiB/s, write: 286.0 MiB/s
INFO:  81% (26.1 GiB of 32.0 GiB) in 2m 40s, read: 474.4 MiB/s, write: 336.5 MiB/s
INFO:  84% (27.0 GiB of 32.0 GiB) in 2m 43s, read: 300.1 MiB/s, write: 278.9 MiB/s
INFO:  88% (28.2 GiB of 32.0 GiB) in 2m 46s, read: 416.9 MiB/s, write: 356.6 MiB/s
INFO:  91% (29.2 GiB of 32.0 GiB) in 2m 49s, read: 357.7 MiB/s, write: 320.9 MiB/s
INFO:  93% (29.9 GiB of 32.0 GiB) in 2m 52s, read: 232.0 MiB/s, write: 164.7 MiB/s
lzop: Input/output error: <stdout>
INFO:  93% (29.9 GiB of 32.0 GiB) in 8m 50s, read: 0 B/s, write: 0 B/s
Warning: unable to close filehandle GEN7427 properly: Input/output error at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 764.
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 101 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2021-10-25 05:48:37
INFO: HOOK: backup-abort snapshot 101
INFO: zabbix sender call: zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k proxmox.vm.snapshot.status[101] -o 1Response from "fl1-vpn.example.cloud:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000024"
INFO: sent: 1; skipped: 0; total: 1
INFO: HOOK: log-end snapshot 101
INFO: HOOK: job-end
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Is there anything in the journal on that host during that time?
Please provide the VM config (qm config 101) and your storage config (cat /etc/pve/storage.cfg)
 
Hello. Thank you for the reply.

Here is qm config 101:
Code:
agent: 1
bootdisk: scsi0
cores: 2
ide2: none,media=cdrom
memory: 4096
name: fl1-conveyor-controller.example.cloud
net0: virtio=42:43:E9:3E:2F:10,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: ssd_zfs:vm-101-disk-0,discard=on,format=raw,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=6eddb8d6-4d31-46b4-8ed5-10f59ec04da3
sockets: 1
vmgenid: e326c9be-5bd2-4dce-9af7-7b3e0b7f6e4e

/etc/pve/storage.cfg:
Code:
dir: S3
        path /mnt/s3_pmx_backup
        content backup
        nodes fl1-pmx1,fl1-pmx2
        prune-backups keep-all=1
        shared 1

joiurnalctl from this morning:
Code:
Oct 29 07:00:07 fl1-pmx2.example.cloud rrdcached[2549]: handle_request_update: Could not read RRD file.
Oct 29 07:00:07 fl1-pmx2.example.cloud pmxcfs[2552]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/fl1-pmx1/S3: -1
Oct 29 07:00:07 fl1-pmx2.example.cloud pmxcfs[2552]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/fl1-pmx1/S3: mmaping file '/var/lib/rrdcached/db/pve2-storage/fl1-pmx1/S3': Invalid argument

Thats all I can see from journal that is relevant.

Hope this helps. Thank you
 
Could you attach the complete journal from before the backup start until shortly afterwards?

Could it be that the connection to the S3 storage is lost for some reason?
 
This doesn't contain the start of the backup, as well as the finish (with error).
Please provide the journal from ~5 minutes before and 5 minutes after the complete backup.
 
Hello. Sorry for posting an invalid journal. The backup started at 6:00 and ended at roughly 6:15 am. Here is journalctl from 5:55 to 6:20.

Hope it helps. Thank you
 

Attachments

That journal is from yesterday, but doesn't contain any backup still.
Was there a backup yesterday that failed, and is this journal from the right host?

Otherwise, if it is still available, you could try the following command:
journalctl --since "2021-10-25 05:00:00" --until "2021-10-25 07:00:00"
 
Hi. So I tailed journalctl using
Code:
journalctl -xef
and saved it into the file. Then I manually ran backup of VM 103.

Proxmox log:
Code:
()
INFO: starting new backup job: vzdump 103 --storage S3 --compress zstd --remove 0 --node fl1-pmx2 --mode snapshot
INFO: HOOK: job-start
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2021-11-04 08:29:43
INFO: status = running
INFO: VM Name: fl1-ingnition.example.com
INFO: include disk 'scsi0' 'ssd_zfs:vm-103-disk-0' 40G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: HOOK: backup-start snapshot 103
INFO: HOOK: pre-stop snapshot 103
INFO: HOOK: pre-restart snapshot 103
INFO: HOOK: post-restart snapshot 103
INFO: creating vzdump archive '/mnt/s3_pmx_backup/dump/vzdump-qemu-103-2021_11_04-08_29_42.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '6d7e4a25-ba81-44fb-b345-07596d74e3f4'
INFO: resuming VM again
INFO:   2% (967.9 MiB of 40.0 GiB) in 3s, read: 322.6 MiB/s, write: 215.5 MiB/s
INFO:   4% (1.6 GiB of 40.0 GiB) in 6s, read: 230.8 MiB/s, write: 183.2 MiB/s
INFO:   6% (2.5 GiB of 40.0 GiB) in 9s, read: 286.4 MiB/s, write: 213.1 MiB/s
INFO:   7% (3.1 GiB of 40.0 GiB) in 12s, read: 232.0 MiB/s, write: 197.2 MiB/s
INFO:   9% (3.8 GiB of 40.0 GiB) in 15s, read: 228.4 MiB/s, write: 188.4 MiB/s
INFO:  10% (4.4 GiB of 40.0 GiB) in 18s, read: 197.0 MiB/s, write: 153.7 MiB/s
INFO:  12% (4.9 GiB of 40.0 GiB) in 21s, read: 175.2 MiB/s, write: 153.2 MiB/s
INFO:  13% (5.5 GiB of 40.0 GiB) in 24s, read: 194.5 MiB/s, write: 176.0 MiB/s
INFO:  15% (6.1 GiB of 40.0 GiB) in 27s, read: 224.8 MiB/s, write: 179.1 MiB/s
INFO:  18% (7.3 GiB of 40.0 GiB) in 30s, read: 404.8 MiB/s, write: 149.8 MiB/s
INFO:  21% (8.7 GiB of 40.0 GiB) in 33s, read: 484.5 MiB/s, write: 160.1 MiB/s
INFO:  28% (11.2 GiB of 40.0 GiB) in 36s, read: 842.1 MiB/s, write: 124.0 MiB/s
INFO:  31% (12.7 GiB of 40.0 GiB) in 39s, read: 497.9 MiB/s, write: 196.2 MiB/s
INFO:  33% (13.4 GiB of 40.0 GiB) in 42s, read: 236.8 MiB/s, write: 232.8 MiB/s
INFO:  34% (14.0 GiB of 40.0 GiB) in 45s, read: 217.5 MiB/s, write: 213.7 MiB/s
INFO:  50% (20.1 GiB of 40.0 GiB) in 48s, read: 2.0 GiB/s, write: 41.3 MiB/s
INFO:  54% (21.9 GiB of 40.0 GiB) in 51s, read: 595.2 MiB/s, write: 267.3 MiB/s
INFO:  56% (22.8 GiB of 40.0 GiB) in 54s, read: 320.2 MiB/s, write: 145.9 MiB/s
INFO:  57% (22.8 GiB of 40.0 GiB) in 2m 36s, read: 291.1 KiB/s, write: 205.5 KiB/s
INFO:  68% (27.6 GiB of 40.0 GiB) in 2m 39s, read: 1.6 GiB/s, write: 102.2 MiB/s
INFO:  70% (28.2 GiB of 40.0 GiB) in 2m 42s, read: 229.6 MiB/s, write: 226.9 MiB/s
INFO:  76% (30.6 GiB of 40.0 GiB) in 2m 45s, read: 821.7 MiB/s, write: 133.5 MiB/s
INFO:  78% (31.3 GiB of 40.0 GiB) in 2m 48s, read: 228.4 MiB/s, write: 227.0 MiB/s
INFO:  80% (32.4 GiB of 40.0 GiB) in 2m 51s, read: 364.1 MiB/s, write: 221.9 MiB/s
INFO:  96% (38.7 GiB of 40.0 GiB) in 2m 54s, read: 2.1 GiB/s, write: 42.8 MiB/s
INFO: 100% (40.0 GiB of 40.0 GiB) in 2m 55s, read: 1.3 GiB/s, write: 508.0 KiB/s
INFO: backup is sparse: 27.84 GiB (69%) total zero data
INFO: transferred 40.00 GiB in 175 seconds (234.1 MiB/s)
zstd: /*stdout*\: Input/output error
Warning: unable to close filehandle GEN8183 properly: Input/output error at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 764.
ERROR: Backup of VM 103 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1
INFO: Failed at 2021-11-04 08:35:12
INFO: HOOK: backup-abort snapshot 103
INFO: zabbix sender call: zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k proxmox.vm.snapshot.status[103] -o 1Response from "fl1-vpn.example.com:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000022"
INFO: sent: 1; skipped: 0; total: 1
INFO: HOOK: log-end snapshot 103
INFO: HOOK: job-end
INFO: Backup job finished with errors
TASK ERROR: job errors

Thanks for your time. I really hope I got it right this time.
 

Attachments

I/O error doesn't sound good.

Can you backup the VM to a different storage, or does it fail there too?
 
Hello. I've narrowed the problem by creating a backup to NFS4. It completed successfully. The problem really seems to be in the S3 storage.
Should I post the s3 parameters from fstab?
 
Yes, post them please as well as the output of mount
 
mount output:
Code:
s3fs on /mnt/s3_pmx_backup type fuse.s3fs (rw,relatime,user_id=0,group_id=0,allow_other)
fstab:
Code:
proxmox-dumps /mnt/s3_pmx_backup fuse.s3fs allow_other,use_path_request_style,passwd_file=/etc/s3fs/passwd-s3_pmx_backup,url=https://s3.example.com,storage_class=standard,endpoint=endpoint,retries=10,nonempty 0 0

Tried to increase timeouts and retries, no luck however.
Should I try to contact the s3 provider?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!