[SOLVED] Back-ups failing after updating PVE

Kenneth_A

New Member
Jun 15, 2020
4
0
1
32
Hi all,

First to explain our infrastructure:

We have 7 nodes in total, 6 of which are in a cluster. 3 nodes in location A, 3 nodes in location B and 1 nodes in location C. Location A and B are clustered. Our PBS is located on Location C . We have been backing-up all our VMs in the cluster in A and B to PBS in location C for months.

I first updated the single node in location C and saw no issues after a day so I went ahead and updated 1 node in location A and B and waited to see for anyother issues.
We noticed all planned back-ups of VM's on those updated nodes failing, and VMs on the non-updated nodes succeeding. I also can't migrate any VM's from an updated node to a non-updated node.

However, the backups of VMs on the node in location C are working correctly, so this lead me to believe it might be networking related since the PBS and VMs in location C are in the same network. However I can perfectly ping and reach all servers across locations, I was able to add a test datastore on PBS and add this to my PVE in locations A and B but had the same issue when running a back-up on the updated nodes.

Next I thought because location C is one single node that has been updated it is completely on the new version, in location A and B however there are 3 nodes each, of which 1 each is updated. Since I also can't migrate VMs from a non-updated node to an updated node in the same location I started thinking that the nodes being a different version might be causing issues. I would however prefer not to test this theory by updating the remaining nodes since this could mean I can't back-up the remaining VMs either.

Some general info:

PBS 1.1.5
PVE 6.2.10 on the non-updated nodes
PVE 6.4.6 on the updated nodes
Proxmox Backup Client 1.1.7-1

The task output in PVE:
INFO: starting new backup job: vzdump 2020 --node dcuxvm202 --mode snapshot --storage PBS-A --remove 0 INFO: Starting Backup of VM 2020 (qemu) INFO: Backup started at 2021-05-20 10:31:38 INFO: status = running INFO: VM Name: BLUE-LEU-ANSIBLE-01 INFO: include disk 'scsi0' 'Silver:vm-2020-disk-0' 80G INFO: include disk 'scsi1' 'Silver:vm-2020-disk-1' 50G INFO: backup mode: snapshot INFO: ionice priority: 7 INFO: creating Proxmox Backup Server archive 'vm/2020/2021-05-20T08:31:38Z' INFO: started backup task '0f0a9145-e720-4559-be73-13795f85628f' INFO: resuming VM again INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared INFO: scsi1: dirty-bitmap status: existing bitmap was invalid and has been cleared INFO: 2% (2.8 GiB of 130.0 GiB) in 3s, read: 953.3 MiB/s, write: 1.3 MiB/s INFO: 2% (3.9 GiB of 130.0 GiB) in 13m 27s, read: 1.4 MiB/s, write: 0 B/s ERROR: backup write data failed: command error: write_data upload error: broken pipe INFO: aborting backup job INFO: resuming VM again ERROR: Backup of VM 2020 failed - backup write data failed: command error: write_data upload error: broken pipe INFO: Failed at 2021-05-20 10:45:06 INFO: Backup job finished with errors TASK ERROR: job errors

The task output in PBS:
2021-05-20T10:31:39+02:00: starting new backup on datastore 'Datastore-A': "vm/2020/2021-05-20T08:31:38Z" 2021-05-20T10:31:39+02:00: GET /previous: 400 Bad Request: no valid previous backup 2021-05-20T10:31:39+02:00: created new fixed index 1 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi0.img.fidx") 2021-05-20T10:31:39+02:00: created new fixed index 2 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi1.img.fidx") 2021-05-20T10:31:39+02:00: add blob "/Backups/A/vm/2020/2021-05-20T08:31:38Z/qemu-server.conf.blob" (425 bytes, comp: 425) 2021-05-20T10:45:05+02:00: backup failed: connection error: Connection timed out (os error 110) 2021-05-20T10:45:05+02:00: removing failed backup 2021-05-20T10:45:05+02:00: PUT /fixed_index: 400 Bad Request: Problems reading request body: error reading a body from connection: broken pipe 2021-05-20T10:45:05+02:00: TASK ERROR: connection error: Connection timed out (os error 110)

Output in Syslog on PBS:
May 20 10:31:39 pbs proxmox-backup-proxy[8609]: starting new backup on datastore 'Datastore-A': "vm/2020/2021-05-20T08:31:38Z" May 20 10:31:39 pbs proxmox-backup-proxy[8609]: GET /previous: 400 Bad Request: no valid previous backup May 20 10:31:39 pbs proxmox-backup-proxy[8609]: created new fixed index 1 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi0.img.fidx") May 20 10:31:39 pbs proxmox-backup-proxy[8609]: created new fixed index 2 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi1.img.fidx") May 20 10:31:39 pbs proxmox-backup-proxy[8609]: add blob "/Backups/A/vm/2020/2021-05-20T08:31:38Z/qemu-server.conf.blob" (425 bytes, comp: 425) May 20 10:39:35 pbs sshd[19039]: Accepted password for root from 10.10.10.82 port 51625 ssh2 May 20 10:39:35 pbs sshd[19039]: pam_unix(sshd:session): session opened for user root by (uid=0) May 20 10:39:35 pbs systemd-logind[967]: New session 30 of user root. May 20 10:39:35 pbs systemd[1]: Created slice User Slice of UID 0. May 20 10:39:35 pbs systemd[1]: Starting User Runtime Directory /run/user/0... May 20 10:39:35 pbs systemd[1]: Started User Runtime Directory /run/user/0. May 20 10:39:35 pbs systemd[1]: Starting User Manager for UID 0... May 20 10:39:35 pbs systemd[19050]: pam_unix(systemd-user:session): session opened for user root by (uid=0) May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent and passphrase cache. May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent and passphrase cache (restricted). May 20 10:39:35 pbs systemd[19050]: Reached target Timers. May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers). May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG network certificate management daemon. May 20 10:39:35 pbs systemd[19050]: Reached target Paths. May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent (ssh-agent emulation). May 20 10:39:35 pbs systemd[19050]: Reached target Sockets. May 20 10:39:35 pbs systemd[19050]: Reached target Basic System. May 20 10:39:35 pbs systemd[19050]: Reached target Default. May 20 10:39:35 pbs systemd[19050]: Startup finished in 60ms. May 20 10:39:35 pbs systemd[1]: Started User Manager for UID 0. May 20 10:39:35 pbs systemd[1]: Started Session 30 of user root. May 20 10:39:43 pbs sshd[19039]: pam_unix(sshd:session): session closed for user root May 20 10:39:43 pbs systemd-logind[967]: Session 30 logged out. Waiting for processes to exit. May 20 10:39:43 pbs systemd[1]: session-30.scope: Succeeded. May 20 10:39:43 pbs systemd-logind[967]: Removed session 30. May 20 10:39:53 pbs systemd[1]: Stopping User Manager for UID 0... May 20 10:39:53 pbs systemd[19050]: Stopped target Default. May 20 10:39:53 pbs systemd[19050]: Stopped target Basic System. May 20 10:39:53 pbs systemd[19050]: Stopped target Paths. May 20 10:39:53 pbs systemd[19050]: Stopped target Timers. May 20 10:39:53 pbs systemd[19050]: Stopped target Sockets. May 20 10:39:53 pbs systemd[19050]: gpg-agent.socket: Succeeded. May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent and passphrase cache. May 20 10:39:53 pbs systemd[19050]: gpg-agent-browser.socket: Succeeded. May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers). May 20 10:39:53 pbs systemd[19050]: gpg-agent-extra.socket: Succeeded. May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent and passphrase cache (restricted). May 20 10:39:53 pbs systemd[19050]: dirmngr.socket: Succeeded. May 20 10:39:53 pbs systemd[19050]: Closed GnuPG network certificate management daemon. May 20 10:39:53 pbs systemd[19050]: gpg-agent-ssh.socket: Succeeded. May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent (ssh-agent emulation). May 20 10:39:53 pbs systemd[19050]: Reached target Shutdown. May 20 10:39:53 pbs systemd[19050]: systemd-exit.service: Succeeded. May 20 10:39:53 pbs systemd[19050]: Started Exit the Session. May 20 10:39:53 pbs systemd[19050]: Reached target Exit the Session. May 20 10:39:53 pbs systemd[19051]: pam_unix(systemd-user:session): session closed for user root May 20 10:39:53 pbs systemd[1]: user@0.service: Succeeded. May 20 10:39:53 pbs systemd[1]: Stopped User Manager for UID 0. May 20 10:39:53 pbs systemd[1]: Stopping User Runtime Directory /run/user/0... May 20 10:39:53 pbs systemd[1]: run-user-0.mount: Succeeded. May 20 10:39:53 pbs systemd[1]: user-runtime-dir@0.service: Succeeded. May 20 10:39:53 pbs systemd[1]: Stopped User Runtime Directory /run/user/0. May 20 10:39:53 pbs systemd[1]: Removed slice User Slice of UID 0. May 20 10:42:04 pbs proxmox-backup-proxy[8609]: error during snapshot file listing: 'unable to load blob '"/Backups/A/vm/2020/2021-05-20T08:31:38Z/index.json.blob"' - No such file or directory (os error 2)' May 20 10:44:48 pbs proxmox-backup-proxy[8609]: error during snapshot file listing: 'unable to load blob '"/Backups/A/vm/2020/2021-05-20T08:31:38Z/index.json.blob"' - No such file or directory (os error 2)' May 20 10:45:05 pbs proxmox-backup-proxy[8609]: backup failed: connection error: Connection timed out (os error 110) May 20 10:45:05 pbs proxmox-backup-proxy[8609]: removing failed backup May 20 10:45:05 pbs proxmox-backup-proxy[8609]: PUT /fixed_index: 400 Bad Request: Problems reading request body: error reading a body from connection: broken pipe May 20 10:45:05 pbs proxmox-backup-proxy[8609]: removing backup snapshot "/Backups/A/vm/2020/2021-05-20T08:31:38Z" May 20 10:45:05 pbs proxmox-backup-proxy[8609]: TASK ERROR: connection error: Connection timed out (os error 110)

If anymore information is required, I'd be happy to provide since I'm pretty much out of ideas.

Thanks in advance!
 
Update:

I did update another node in one location so that now all 3 nodes in that location are fully updated, PVE shows the latest version in the GUI. However I did not reboot the machine so the kernel update hasnt been applied. I can now migrate VMs from and to all nodes in this location again. If I try a back-up on this node it does succeed, when I move the VM the another node it fails with the same error.

This got me thinking it might be something with the kernel update, since when I try to back-up a VM on one of the nodes that are fully updated with a reboot it fails, but when I try it on the node that was updated without reboot it does succeed.
 
Last edit:

We have a few different static routes set up between locations and turns out the one we were using before the update doesnt work after the update anymore. We updated this to another static route that we have and this one does work. We can't fully explain why the the static route we've been using for months doesn't work anymore after the updates, but everything is up and running again so all ends well :D
 
I am also seeing the same error, and this started to happen after i updated my system this morning

INFO: starting new backup job: vzdump 102 --node pve-nuc --compress zstd --storage SynoNAS --mode snapshot --remove 0 INFO: Starting Backup of VM 102 (lxc) INFO: Backup started at 2021-05-23 08:50:41 INFO: status = running INFO: CT Name: unifi INFO: including mount point rootfs ('/') in backup INFO: mode failure - some volumes do not support snapshots INFO: trying 'suspend' mode instead INFO: backup mode: suspend INFO: ionice priority: 7 INFO: CT Name: unifi INFO: including mount point rootfs ('/') in backup INFO: temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf INFO: starting first sync /proc/2401/root/ to /mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp INFO: first sync finished - transferred 2.50G bytes in 648s INFO: suspending guest INFO: starting final sync /proc/2401/root/ to /mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp INFO: final sync finished - transferred 149.95M bytes in 5s INFO: resuming guest INFO: guest is online again after 5 seconds INFO: creating vzdump archive '/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tar.zst' INFO: tar: /mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp: Cannot open: Permission denied INFO: tar: Error is not recoverable: exiting now ERROR: Backup of VM 102 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' . | zstd --rsyncable '--threads=1' >/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tar.dat' failed: exit code 2 INFO: Failed at 2021-05-23 09:02:32 INFO: Backup job finished with errors TASK ERROR: job errors

pveversion
proxmox-ve: 6.4-1 (running kernel: 5.4.114-1-pve) pve-manager: 6.4-6 (running version: 6.4-6/be2fa32c) pve-kernel-5.4: 6.4-2 pve-kernel-helper: 6.4-2 pve-kernel-5.4.114-1-pve: 5.4.114-1 pve-kernel-5.4.106-1-pve: 5.4.106-1 pve-kernel-5.4.73-1-pve: 5.4.73-1 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.4-1 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-3 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-2 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.1.6-2 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.5-5 pve-cluster: 6.4-1 pve-container: 3.3-5 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.2-3 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.4-pve1
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!