VM restore from PBS fails

jserra71 · Sep 22, 2025

Hi,

I'm setting up my home lab to a full Proxmox based configuration.

There are four servers:

minho: current server running PVE. Critical services like homeassistant are running here. The goal is to move the VMs on this server to pve-wrk. After that this server will be decommissioned.

pve-wrk: main server running all the required services (homeassistant, immich, etc).
pve-stb: standby server for backups. It will be powered up once a week for backups. It will run PBS on a VM in order to backup pve-wrk. In case of a catastrophic failure of pve-wrk, this will be the replacement server.
pve-rem: remote PBS server for backups. PBS is running bare metal. This is an off-site server.

Currently the servers are co-located, connected on the same switch.

I installed PVE in pve-wrk, installed PBS in pbs-rem, all running with the latest updates. Storage ready, datastores ready, all ready to start the migration.
Added PBS storage to server 'minho', backed up the VMs successfully without any problem.

Added the PBS storage to server 'pve-wrk' in order to restore the VMs from PBS. The smaller ones restored ok, but the homeassistant VM restore (~200GB), starts ok, but after same random time, it stalls (no more chunk reads appear on the PBS task), and several minutes later, the process ends with an error.

There is nothing on journalctl on PVE and PBS that can point to the problem. Not a single error, nothing.
atop on PBS only shows that during the restore the ethernet port is at almost 100%, which is expected.

Example of a restore log with debug activated:

PVE task log:
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
new volume ID is 'local-zfs:vm-130-disk-0'
new volume ID is 'local-zfs:vm-130-disk-3'
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

bsha vm/130/2025-09-21T09:02:11Z drive-efidisk0.img.fidx /dev/zvol/rpool/data/vm-130-disk-0 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero
connecting to repository 'pve-wrk@pbs@192.168.50.106

bsha'
using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-0'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 100% (read 540672 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=540672, duration=0.00s, speed=218.84MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero
connecting to repository 'pve-wrk@pbs@192.168.50.106

bsha'
using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-3'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 1% (read 2751463424 bytes, zeroes = 5% (159383552 bytes), duration 15 sec)
progress 2% (read 5498732544 bytes, zeroes = 3% (192937984 bytes), duration 31 sec)
progress 3% (read 8250195968 bytes, zeroes = 2% (209715200 bytes), duration 47 sec)
progress 4% (read 10997465088 bytes, zeroes = 2% (226492416 bytes), duration 65 sec)
restore failed: connection reset
temporary volume 'local-zfs:vm-130-disk-3' successfully removed
temporary volume 'local-zfs:vm-130-disk-0' successfully removed
error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
TASK ERROR: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255

PVE journalctl:
Sep 22 22:34:59 minho kernel: Alternate GPT is invalid, using primary GPT.
Sep 22 22:34:59 minho kernel: zd96: p1 p2 p3 p4 p5 p6 p7 p8
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Sep 22 22:35:01 minho CRON[60346]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session closed for user root
Sep 22 22:35:01 minho pvedaemon[51473]: error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
Sep 22 22:35:01 minho pvedaemon[51473]: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255
Sep 22 22:35:01 minho pvedaemon[21725]: <root@pam> end task UPID:minho:0000C911:00088C4F:68D1BCDA:qmrestore:130:root@pam: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255

PBS task log with debug activated:

2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: DEBUG: send
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/2ea6/2ea607d5a1ad3c4c50dfb9ded6366c26b35bb266cc4fdb48743916191529b4ef"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/5e10/5e10aa9f5988f129d66bee779ff8e8e77412627e02de9f4d5900d9c5af8484db"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/92f9/92f96c979c2c8772dd2a43aa4e9264d76faeca6d398810e98c608cac3f153f7a"
2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: GET /chunk

It stalls at 22:18:34, and only after several minutes, pve fails with the "restore failed: connection reset".

Right now, I don't know where to look. Any ideas?

TLDR:
Restoring a backup from PBS to PVE fails with a "restore failed: connection reset" message. This error happens randomly during the restore process. I don’t know where to look to debug this.

Right now, I don't know where to look. Any ideas?

Onslow · Sep 23, 2025

Maybe it's not the right trace (as you stated the error is random), but... I'd start with verification of the backups.

https://pbs.proxmox.com/docs/maintenance.html#verification

... Aside from using verify jobs, you can also run verification manually on entire datastores, backup groups or snapshots. To do this, navigate to the Content tab of the datastore and either click Verify All or select the V. icon from the Actions column in the table.

jserra71 · Sep 23, 2025

Onslow said:
Maybe it's not the right trace (as you stated the error is random), but... I'd start with verification of the backups.

https://pbs.proxmox.com/docs/maintenance.html#verification

... Aside from using verify jobs, you can also run verification manually on entire datastores, backup groups or snapshots. To do this, navigate to the Content tab of the datastore and either click Verify All or select the V. icon from the Actions column in the table.

Thanks for the tip. I've already done that.

Now i'm focusing on the network connection. Rsync between pve-wrk and pbs-rem has the same behaviour.

Onslow · Sep 23, 2025

The network is my second thought as well. As you wrote that servers are connected to the same switch... maybe an old capacitor (I had a similar situation) or bad cable/jack...
I would try slower transfer. I don't remember the option name. Something about bandwidth limit or traffic control. I wonder if limiting to e.g 10% of theoretical bandwidth of the port will improve the situation.

jserra71 · Oct 14, 2025

Hi,

Just to close this, maybe it can help someone.

Did a lot of testing, and the conclusion was that this is some kind of malfunction on the onboard network adapter:
04:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)

Is is not only Proxmox restore that fails, but also rsync and scp.

Added a new network card to the server, and the restores using that card work without any problems.

Search

Search

VM restore from PBS fails

jserra71

New Member

Onslow

Active Member

jserra71

New Member

Onslow

Active Member

jserra71

New Member

We value your privacy