VM restore from PBS fails

jserra71

New Member
Sep 22, 2025
1
0
1
Hi,

I'm setting up my home lab to a full Proxmox based configuration.

There are four servers:

minho: current server running PVE. Critical services like homeassistant are running here. The goal is to move the VMs on this server to pve-wrk. After that this server will be decommissioned.

pve-wrk: main server running all the required services (homeassistant, immich, etc).
pve-stb: standby server for backups. It will be powered up once a week for backups. It will run PBS on a VM in order to backup pve-wrk. In case of a catastrophic failure of pve-wrk, this will be the replacement server.
pve-rem: remote PBS server for backups. PBS is running bare metal. This is an off-site server.

Currently the servers are co-located, connected on the same switch.

I installed PVE in pve-wrk, installed PBS in pbs-rem, all running with the latest updates. Storage ready, datastores ready, all ready to start the migration.
Added PBS storage to server 'minho', backed up the VMs successfully without any problem.

Added the PBS storage to server 'pve-wrk' in order to restore the VMs from PBS. The smaller ones restored ok, but the homeassistant VM restore (~200GB), starts ok, but after same random time, it stalls (no more chunk reads appear on the PBS task), and several minutes later, the process ends with an error.

There is nothing on journalctl on PVE and PBS that can point to the problem. Not a single error, nothing.
atop on PBS only shows that during the restore the ethernet port is at almost 100%, which is expected.

Example of a restore log with debug activated:

PVE task log:
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
new volume ID is 'local-zfs:vm-130-disk-0'
new volume ID is 'local-zfs:vm-130-disk-3'
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106:pbsha vm/130/2025-09-21T09:02:11Z drive-efidisk0.img.fidx /dev/zvol/rpool/data/vm-130-disk-0 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero
connecting to repository 'pve-wrk@pbs@192.168.50.106:pbsha'
using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-0'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 100% (read 540672 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=540672, duration=0.00s, speed=218.84MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106:pbsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero
connecting to repository 'pve-wrk@pbs@192.168.50.106:pbsha'
using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-3'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 1% (read 2751463424 bytes, zeroes = 5% (159383552 bytes), duration 15 sec)
progress 2% (read 5498732544 bytes, zeroes = 3% (192937984 bytes), duration 31 sec)
progress 3% (read 8250195968 bytes, zeroes = 2% (209715200 bytes), duration 47 sec)
progress 4% (read 10997465088 bytes, zeroes = 2% (226492416 bytes), duration 65 sec)
restore failed: connection reset
temporary volume 'local-zfs:vm-130-disk-3' successfully removed
temporary volume 'local-zfs:vm-130-disk-0' successfully removed
error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
TASK ERROR: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106:pbsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255

PVE journalctl:
Sep 22 22:34:59 minho kernel: Alternate GPT is invalid, using primary GPT.
Sep 22 22:34:59 minho kernel: zd96: p1 p2 p3 p4 p5 p6 p7 p8
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Sep 22 22:35:01 minho CRON[60346]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session closed for user root
Sep 22 22:35:01 minho pvedaemon[51473]: error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
Sep 22 22:35:01 minho pvedaemon[51473]: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106:pbsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255
Sep 22 22:35:01 minho pvedaemon[21725]: <root@pam> end task UPID:minho:0000C911:00088C4F:68D1BCDA:qmrestore:130:root@pam: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106:pbsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255


PBS task log with debug activated:

2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: DEBUG: send
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/2ea6/2ea607d5a1ad3c4c50dfb9ded6366c26b35bb266cc4fdb48743916191529b4ef"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/5e10/5e10aa9f5988f129d66bee779ff8e8e77412627e02de9f4d5900d9c5af8484db"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/92f9/92f96c979c2c8772dd2a43aa4e9264d76faeca6d398810e98c608cac3f153f7a"
2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: GET /chunk

It stalls at 22:18:34, and only after several minutes, pve fails with the "restore failed: connection reset".

Right now, I don't know where to look. Any ideas?

TLDR:
Restoring a backup from PBS to PVE fails with a "restore failed: connection reset" message. This error happens randomly during the restore process. I don’t know where to look to debug this.

Right now, I don't know where to look. Any ideas?
 
Maybe it's not the right trace (as you stated the error is random), but... I'd start with verification of the backups.

https://pbs.proxmox.com/docs/maintenance.html#verification

... Aside from using verify jobs, you can also run verification manually on entire datastores, backup groups or snapshots. To do this, navigate to the Content tab of the datastore and either click Verify All or select the V. icon from the Actions column in the table.
 
Last edited:
  • Like
Reactions: UdoB