ERROR: backup write data failed: command error: protocol canceled

apatchin · May 4, 2021

Hello, I recently migrated all of my VMs from one cluster to another through backups. I have also setup a new PBS to use for new backups (leaving the old PBS with an "old copy" just in case).

Backup failures are so far only one node in cluster, backing up to the new/empty PBS.

I am getting these kinds of backup failures on the VE side.

Code:

INFO: starting new backup job: vzdump 105 --storage FakePBSName --remove 0 --mode snapshot --node FakeNodeName
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2021-05-04 14:56:21
INFO: status = running
INFO: VM Name: FakeVMName
INFO: include disk 'sata0' 'local-zfs:vm-105-disk-1' 32G
INFO: include disk 'sata1' 'local-zfs:vm-105-disk-2' 64G
INFO: include disk 'efidisk0' 'local-zfs:vm-105-disk-0' 1M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/105/2021-05-04T18:56:21Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '50825db7-b0c8-4554-ba38-d10856fff4c6'
INFO: resuming VM again
INFO: efidisk0: dirty-bitmap status: created new
INFO: sata0: dirty-bitmap status: created new
INFO: sata1: dirty-bitmap status: created new
INFO:   0% (468.0 MiB of 96.0 GiB) in 3s, read: 156.0 MiB/s, write: 104.0 MiB/s
INFO:   1% (1012.0 MiB of 96.0 GiB) in 11s, read: 68.0 MiB/s, write: 64.0 MiB/s
INFO:   2% (1.9 GiB of 96.0 GiB) in 27s, read: 61.5 MiB/s, write: 61.5 MiB/s
INFO:   3% (2.9 GiB of 96.0 GiB) in 43s, read: 62.2 MiB/s, write: 60.2 MiB/s
INFO:   4% (3.8 GiB of 96.0 GiB) in 55s, read: 79.0 MiB/s, write: 61.0 MiB/s
INFO:   5% (4.8 GiB of 96.0 GiB) in 1m 12s, read: 59.5 MiB/s, write: 47.1 MiB/s
INFO:   6% (6.1 GiB of 96.0 GiB) in 1m 15s, read: 438.7 MiB/s, write: 21.3 MiB/s
INFO:   7% (6.8 GiB of 96.0 GiB) in 1m 25s, read: 68.8 MiB/s, write: 59.2 MiB/s
INFO:   8% (7.7 GiB of 96.0 GiB) in 1m 41s, read: 60.2 MiB/s, write: 60.2 MiB/s
INFO:   9% (8.7 GiB of 96.0 GiB) in 1m 50s, read: 105.8 MiB/s, write: 60.0 MiB/s
INFO:  10% (9.6 GiB of 96.0 GiB) in 2m 5s, read: 64.0 MiB/s, write: 60.5 MiB/s
INFO:  11% (10.6 GiB of 96.0 GiB) in 2m 21s, read: 61.5 MiB/s, write: 61.5 MiB/s
INFO:  11% (10.9 GiB of 96.0 GiB) in 2m 27s, read: 50.0 MiB/s, write: 50.0 MiB/s
ERROR: backup write data failed: command error: protocol canceled
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 105 failed - backup write data failed: command error: protocol canceled
INFO: Failed at 2021-05-04 14:58:53
INFO: Backup job finished with errors
TASK ERROR: job errors

Then on the PBS side...

Code:

Proxmox
Backup Server 1.1-5
()
2021-05-04T14:56:24-04:00: starting new backup on datastore 'pvebackups': "vm/105/2021-05-04T18:56:21Z"
2021-05-04T14:56:24-04:00: GET /previous: 400 Bad Request: no valid previous backup
2021-05-04T14:56:24-04:00: created new fixed index 1 ("vm/105/2021-05-04T18:56:21Z/drive-efidisk0.img.fidx")
2021-05-04T14:56:24-04:00: created new fixed index 2 ("vm/105/2021-05-04T18:56:21Z/drive-sata0.img.fidx")
2021-05-04T14:56:24-04:00: created new fixed index 3 ("vm/105/2021-05-04T18:56:21Z/drive-sata1.img.fidx")
2021-05-04T14:56:24-04:00: add blob "/mnt/datastore/pvebackups/vm/105/2021-05-04T18:56:21Z/qemu-server.conf.blob" (436 bytes, comp: 436)
2021-05-04T14:58:53-04:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2021-05-04T14:58:53-04:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2021-05-04T14:58:53-04:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2021-05-04T14:58:53-04:00: backup failed: connection error: error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:../ssl/record/ssl3_record.c:677:
2021-05-04T14:58:53-04:00: removing failed backup
2021-05-04T14:58:53-04:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2021-05-04T14:58:53-04:00: TASK ERROR: connection error: error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:../ssl/record/ssl3_record.c:677:
2021-05-04T14:58:53-04:00: POST /fixed_chunk: 400 Bad Request: backup already marked as finished.

I have had ping running and don't seem to be dropping any packets or anything, so not sure what the communication issue can be. I have 6 other nodes backing up just fine without errors, it's just this one being funky.

fabian · May 5, 2021

since it's just one node/client making trouble, I'd look at that system first.

start by checking your memory on the PVE side.. if that comes up good, check your NIC settings and disable various offloading features there to rule out buggy behaviour on that side.

is the error reproducible for every backup? always at a similar progress count, or random? backing up to the "old" PBS works? how is the network link in-between PVE->old PBS and PVE->new PBS? could you include your pveversion -v output from working and non-working nodes?

apatchin · May 7, 2021

Thanks, I'm doing some more testing to identify if it's the node or the VM + node combination (would be weird but never know). I was able to backup a 4GB VM without issue, but the bigger ones (32gb for example) either fail right away for get to about 5% and fail. Just tried to setup a larger blank VM on the node and the installation failed. Gonna just assume it's an issue with storage or memory at this point and retire this endpoint. Sorry for wasting time on this one

bartoszpijet · Jul 22, 2022

Had a similar issue in case anyone is searching for it (different cause for sure). Backups were sending about 100MB, 0% progress, and it just stayed like this.

In my case, it was a firewall issue. PBS and PVE were in different subnets and both had different routers. This caused packets to not be received properly since they were theoretically sent by different IP.

Adding interface with same subnet to PBS solved issue.

Search

Search

ERROR: backup write data failed: command error: protocol canceled

apatchin

New Member

fabian

Proxmox Staff Member

apatchin

New Member

bartoszpijet

New Member