[SOLVED] Back-ups failing after updating PVE

Kenneth_A · May 20, 2021

Hi all,

First to explain our infrastructure:

We have 7 nodes in total, 6 of which are in a cluster. 3 nodes in location A, 3 nodes in location B and 1 nodes in location C. Location A and B are clustered. Our PBS is located on Location C . We have been backing-up all our VMs in the cluster in A and B to PBS in location C for months.

I first updated the single node in location C and saw no issues after a day so I went ahead and updated 1 node in location A and B and waited to see for anyother issues.
We noticed all planned back-ups of VM's on those updated nodes failing, and VMs on the non-updated nodes succeeding. I also can't migrate any VM's from an updated node to a non-updated node.

However, the backups of VMs on the node in location C are working correctly, so this lead me to believe it might be networking related since the PBS and VMs in location C are in the same network. However I can perfectly ping and reach all servers across locations, I was able to add a test datastore on PBS and add this to my PVE in locations A and B but had the same issue when running a back-up on the updated nodes.

Next I thought because location C is one single node that has been updated it is completely on the new version, in location A and B however there are 3 nodes each, of which 1 each is updated. Since I also can't migrate VMs from a non-updated node to an updated node in the same location I started thinking that the nodes being a different version might be causing issues. I would however prefer not to test this theory by updating the remaining nodes since this could mean I can't back-up the remaining VMs either.

Some general info:

PBS 1.1.5
PVE 6.2.10 on the non-updated nodes
PVE 6.4.6 on the updated nodes
Proxmox Backup Client 1.1.7-1

The task output in PVE:


INFO: starting new backup job: vzdump 2020 --node dcuxvm202 --mode snapshot --storage PBS-A --remove 0
INFO: Starting Backup of VM 2020 (qemu)
INFO: Backup started at 2021-05-20 10:31:38
INFO: status = running
INFO: VM Name: BLUE-LEU-ANSIBLE-01
INFO: include disk 'scsi0' 'Silver:vm-2020-disk-0' 80G
INFO: include disk 'scsi1' 'Silver:vm-2020-disk-1' 50G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/2020/2021-05-20T08:31:38Z'
INFO: started backup task '0f0a9145-e720-4559-be73-13795f85628f'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi1: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO:   2% (2.8 GiB of 130.0 GiB) in 3s, read: 953.3 MiB/s, write: 1.3 MiB/s
INFO:   2% (3.9 GiB of 130.0 GiB) in 13m 27s, read: 1.4 MiB/s, write: 0 B/s
ERROR: backup write data failed: command error: write_data upload error: broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 2020 failed - backup write data failed: command error: write_data upload error: broken pipe
INFO: Failed at 2021-05-20 10:45:06
INFO: Backup job finished with errors
TASK ERROR: job errors

The task output in PBS:


2021-05-20T10:31:39+02:00: starting new backup on datastore 'Datastore-A': "vm/2020/2021-05-20T08:31:38Z"
2021-05-20T10:31:39+02:00: GET /previous: 400 Bad Request: no valid previous backup
2021-05-20T10:31:39+02:00: created new fixed index 1 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi0.img.fidx")
2021-05-20T10:31:39+02:00: created new fixed index 2 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi1.img.fidx")
2021-05-20T10:31:39+02:00: add blob "/Backups/A/vm/2020/2021-05-20T08:31:38Z/qemu-server.conf.blob" (425 bytes, comp: 425)
2021-05-20T10:45:05+02:00: backup failed: connection error: Connection timed out (os error 110)
2021-05-20T10:45:05+02:00: removing failed backup
2021-05-20T10:45:05+02:00: PUT /fixed_index: 400 Bad Request: Problems reading request body: error reading a body from connection: broken pipe
2021-05-20T10:45:05+02:00: TASK ERROR: connection error: Connection timed out (os error 110)

Output in Syslog on PBS:


May 20 10:31:39 pbs proxmox-backup-proxy[8609]: starting new backup on datastore 'Datastore-A': "vm/2020/2021-05-20T08:31:38Z"
May 20 10:31:39 pbs proxmox-backup-proxy[8609]: GET /previous: 400 Bad Request: no valid previous backup
May 20 10:31:39 pbs proxmox-backup-proxy[8609]: created new fixed index 1 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi0.img.fidx")
May 20 10:31:39 pbs proxmox-backup-proxy[8609]: created new fixed index 2 ("vm/2020/2021-05-20T08:31:38Z/drive-scsi1.img.fidx")
May 20 10:31:39 pbs proxmox-backup-proxy[8609]: add blob "/Backups/A/vm/2020/2021-05-20T08:31:38Z/qemu-server.conf.blob" (425 bytes, comp: 425)
May 20 10:39:35 pbs sshd[19039]: Accepted password for root from 10.10.10.82 port 51625 ssh2
May 20 10:39:35 pbs sshd[19039]: pam_unix(sshd:session): session opened for user root by (uid=0)
May 20 10:39:35 pbs systemd-logind[967]: New session 30 of user root.
May 20 10:39:35 pbs systemd[1]: Created slice User Slice of UID 0.
May 20 10:39:35 pbs systemd[1]: Starting User Runtime Directory /run/user/0...
May 20 10:39:35 pbs systemd[1]: Started User Runtime Directory /run/user/0.
May 20 10:39:35 pbs systemd[1]: Starting User Manager for UID 0...
May 20 10:39:35 pbs systemd[19050]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent and passphrase cache.
May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
May 20 10:39:35 pbs systemd[19050]: Reached target Timers.
May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG network certificate management daemon.
May 20 10:39:35 pbs systemd[19050]: Reached target Paths.
May 20 10:39:35 pbs systemd[19050]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
May 20 10:39:35 pbs systemd[19050]: Reached target Sockets.
May 20 10:39:35 pbs systemd[19050]: Reached target Basic System.
May 20 10:39:35 pbs systemd[19050]: Reached target Default.
May 20 10:39:35 pbs systemd[19050]: Startup finished in 60ms.
May 20 10:39:35 pbs systemd[1]: Started User Manager for UID 0.
May 20 10:39:35 pbs systemd[1]: Started Session 30 of user root.
May 20 10:39:43 pbs sshd[19039]: pam_unix(sshd:session): session closed for user root
May 20 10:39:43 pbs systemd-logind[967]: Session 30 logged out. Waiting for processes to exit.
May 20 10:39:43 pbs systemd[1]: session-30.scope: Succeeded.
May 20 10:39:43 pbs systemd-logind[967]: Removed session 30.
May 20 10:39:53 pbs systemd[1]: Stopping User Manager for UID 0...
May 20 10:39:53 pbs systemd[19050]: Stopped target Default.
May 20 10:39:53 pbs systemd[19050]: Stopped target Basic System.
May 20 10:39:53 pbs systemd[19050]: Stopped target Paths.
May 20 10:39:53 pbs systemd[19050]: Stopped target Timers.
May 20 10:39:53 pbs systemd[19050]: Stopped target Sockets.
May 20 10:39:53 pbs systemd[19050]: gpg-agent.socket: Succeeded.
May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent and passphrase cache.
May 20 10:39:53 pbs systemd[19050]: gpg-agent-browser.socket: Succeeded.
May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
May 20 10:39:53 pbs systemd[19050]: gpg-agent-extra.socket: Succeeded.
May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
May 20 10:39:53 pbs systemd[19050]: dirmngr.socket: Succeeded.
May 20 10:39:53 pbs systemd[19050]: Closed GnuPG network certificate management daemon.
May 20 10:39:53 pbs systemd[19050]: gpg-agent-ssh.socket: Succeeded.
May 20 10:39:53 pbs systemd[19050]: Closed GnuPG cryptographic agent (ssh-agent emulation).
May 20 10:39:53 pbs systemd[19050]: Reached target Shutdown.
May 20 10:39:53 pbs systemd[19050]: systemd-exit.service: Succeeded.
May 20 10:39:53 pbs systemd[19050]: Started Exit the Session.
May 20 10:39:53 pbs systemd[19050]: Reached target Exit the Session.
May 20 10:39:53 pbs systemd[19051]: pam_unix(systemd-user:session): session closed for user root
May 20 10:39:53 pbs systemd[1]: user@0.service: Succeeded.
May 20 10:39:53 pbs systemd[1]: Stopped User Manager for UID 0.
May 20 10:39:53 pbs systemd[1]: Stopping User Runtime Directory /run/user/0...
May 20 10:39:53 pbs systemd[1]: run-user-0.mount: Succeeded.
May 20 10:39:53 pbs systemd[1]: user-runtime-dir@0.service: Succeeded.
May 20 10:39:53 pbs systemd[1]: Stopped User Runtime Directory /run/user/0.
May 20 10:39:53 pbs systemd[1]: Removed slice User Slice of UID 0.
May 20 10:42:04 pbs proxmox-backup-proxy[8609]: error during snapshot file listing: 'unable to load blob '"/Backups/A/vm/2020/2021-05-20T08:31:38Z/index.json.blob"' - No such file or directory (os error 2)'
May 20 10:44:48 pbs proxmox-backup-proxy[8609]: error during snapshot file listing: 'unable to load blob '"/Backups/A/vm/2020/2021-05-20T08:31:38Z/index.json.blob"' - No such file or directory (os error 2)'
May 20 10:45:05 pbs proxmox-backup-proxy[8609]: backup failed: connection error: Connection timed out (os error 110)
May 20 10:45:05 pbs proxmox-backup-proxy[8609]: removing failed backup
May 20 10:45:05 pbs proxmox-backup-proxy[8609]: PUT /fixed_index: 400 Bad Request: Problems reading request body: error reading a body from connection: broken pipe
May 20 10:45:05 pbs proxmox-backup-proxy[8609]: removing backup snapshot "/Backups/A/vm/2020/2021-05-20T08:31:38Z"
May 20 10:45:05 pbs proxmox-backup-proxy[8609]: TASK ERROR: connection error: Connection timed out (os error 110)

If anymore information is required, I'd be happy to provide since I'm pretty much out of ideas.

Thanks in advance!

Kenneth_A · May 20, 2021

Update:

I did update another node in one location so that now all 3 nodes in that location are fully updated, PVE shows the latest version in the GUI. However I did not reboot the machine so the kernel update hasnt been applied. I can now migrate VMs from and to all nodes in this location again. If I try a back-up on this node it does succeed, when I move the VM the another node it fails with the same error.

This got me thinking it might be something with the kernel update, since when I try to back-up a VM on one of the nodes that are fully updated with a reboot it fails, but when I try it on the node that was updated without reboot it does succeed.

Kenneth_A · May 21, 2021

Last edit:

We have a few different static routes set up between locations and turns out the one we were using before the update doesnt work after the update anymore. We updated this to another static route that we have and this one does work. We can't fully explain why the the static route we've been using for months doesn't work anymore after the updates, but everything is up and running again so all ends well

mcarr · May 23, 2021

I am also seeing the same error, and this started to happen after i updated my system this morning


INFO: starting new backup job: vzdump 102 --node pve-nuc --compress zstd --storage SynoNAS --mode snapshot --remove 0
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2021-05-23 08:50:41
INFO: status = running
INFO: CT Name: unifi
INFO: including mount point rootfs ('/') in backup
INFO: mode failure - some volumes do not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
INFO: CT Name: unifi
INFO: including mount point rootfs ('/') in backup
INFO: temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf
INFO: starting first sync /proc/2401/root/ to /mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp
INFO: first sync finished - transferred 2.50G bytes in 648s
INFO: suspending guest
INFO: starting final sync /proc/2401/root/ to /mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp
INFO: final sync finished - transferred 149.95M bytes in 5s
INFO: resuming guest
INFO: guest is online again after 5 seconds
INFO: creating vzdump archive '/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tar.zst'
INFO: tar: /mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp: Cannot open: Permission denied
INFO: tar: Error is not recoverable: exiting now
ERROR: Backup of VM 102 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tmp' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' . | zstd --rsyncable '--threads=1' >/mnt/pve/SynoNAS/dump/vzdump-lxc-102-2021_05_23-08_50_41.tar.dat' failed: exit code 2
INFO: Failed at 2021-05-23 09:02:32
INFO: Backup job finished with errors
TASK ERROR: job errors

pveversion


proxmox-ve: 6.4-1 (running kernel: 5.4.114-1-pve)
pve-manager: 6.4-6 (running version: 6.4-6/be2fa32c)
pve-kernel-5.4: 6.4-2
pve-kernel-helper: 6.4-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-2
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.6-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-5
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-3
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

Search

Search

[SOLVED] Back-ups failing after updating PVE

Kenneth_A

New Member

Kenneth_A

New Member

Kenneth_A

New Member

mcarr

Member

We value your privacy