Hi,
I'm setting up my home lab to a full Proxmox based configuration.
There are four servers:
minho: current server running PVE. Critical services like homeassistant are running here. The goal is to move the VMs on this server to pve-wrk. After that this server will be decommissioned.
pve-wrk: main server running all the required services (homeassistant, immich, etc).
pve-stb: standby server for backups. It will be powered up once a week for backups. It will run PBS on a VM in order to backup pve-wrk. In case of a catastrophic failure of pve-wrk, this will be the replacement server.
pve-rem: remote PBS server for backups. PBS is running bare metal. This is an off-site server.
Currently the servers are co-located, connected on the same switch.
I installed PVE in pve-wrk, installed PBS in pbs-rem, all running with the latest updates. Storage ready, datastores ready, all ready to start the migration.
Added PBS storage to server 'minho', backed up the VMs successfully without any problem.
Added the PBS storage to server 'pve-wrk' in order to restore the VMs from PBS. The smaller ones restored ok, but the homeassistant VM restore (~200GB), starts ok, but after same random time, it stalls (no more chunk reads appear on the PBS task), and several minutes later, the process ends with an error.
There is nothing on journalctl on PVE and PBS that can point to the problem. Not a single error, nothing.
atop on PBS only shows that during the restore the ethernet port is at almost 100%, which is expected.
Example of a restore log with debug activated:
PVE task log:
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
new volume ID is 'local-zfs:vm-130-disk-0'
new volume ID is 'local-zfs:vm-130-disk-3'
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106
bsha vm/130/2025-09-21T09:02:11Z drive-efidisk0.img.fidx /dev/zvol/rpool/data/vm-130-disk-0 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero
connecting to repository 'pve-wrk@pbs@192.168.50.106
bsha'
using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-0'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 100% (read 540672 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=540672, duration=0.00s, speed=218.84MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106
bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero
connecting to repository 'pve-wrk@pbs@192.168.50.106
bsha'
using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-3'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 1% (read 2751463424 bytes, zeroes = 5% (159383552 bytes), duration 15 sec)
progress 2% (read 5498732544 bytes, zeroes = 3% (192937984 bytes), duration 31 sec)
progress 3% (read 8250195968 bytes, zeroes = 2% (209715200 bytes), duration 47 sec)
progress 4% (read 10997465088 bytes, zeroes = 2% (226492416 bytes), duration 65 sec)
restore failed: connection reset
temporary volume 'local-zfs:vm-130-disk-3' successfully removed
temporary volume 'local-zfs:vm-130-disk-0' successfully removed
error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
TASK ERROR: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106
bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255
PVE journalctl:
Sep 22 22:34:59 minho kernel: Alternate GPT is invalid, using primary GPT.
Sep 22 22:34:59 minho kernel: zd96: p1 p2 p3 p4 p5 p6 p7 p8
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Sep 22 22:35:01 minho CRON[60346]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session closed for user root
Sep 22 22:35:01 minho pvedaemon[51473]: error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
Sep 22 22:35:01 minho pvedaemon[51473]: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106
bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255
Sep 22 22:35:01 minho pvedaemon[21725]: <root@pam> end task UPID:minho:0000C911:00088C4F:68D1BCDA:qmrestore:130:root@pam: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106
bsha vm/130/2025-09-21T09:02:11Z drive-scsi0.img.fidx /dev/zvol/rpool/data/vm-130-disk-3 --verbose --format raw --keyfile /etc/pve/priv/storage/pbsha.enc --skip-zero' failed: exit code 255
PBS task log with debug activated:
2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: DEBUG: send
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/2ea6/2ea607d5a1ad3c4c50dfb9ded6366c26b35bb266cc4fdb48743916191529b4ef"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/5e10/5e10aa9f5988f129d66bee779ff8e8e77412627e02de9f4d5900d9c5af8484db"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/92f9/92f96c979c2c8772dd2a43aa4e9264d76faeca6d398810e98c608cac3f153f7a"
2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: GET /chunk
It stalls at 22:18:34, and only after several minutes, pve fails with the "restore failed: connection reset".
Right now, I don't know where to look. Any ideas?
TLDR:
Restoring a backup from PBS to PVE fails with a "restore failed: connection reset" message. This error happens randomly during the restore process. I don’t know where to look to debug this.
Right now, I don't know where to look. Any ideas?
I'm setting up my home lab to a full Proxmox based configuration.
There are four servers:
minho: current server running PVE. Critical services like homeassistant are running here. The goal is to move the VMs on this server to pve-wrk. After that this server will be decommissioned.
pve-wrk: main server running all the required services (homeassistant, immich, etc).
pve-stb: standby server for backups. It will be powered up once a week for backups. It will run PBS on a VM in order to backup pve-wrk. In case of a catastrophic failure of pve-wrk, this will be the replacement server.
pve-rem: remote PBS server for backups. PBS is running bare metal. This is an off-site server.
Currently the servers are co-located, connected on the same switch.
I installed PVE in pve-wrk, installed PBS in pbs-rem, all running with the latest updates. Storage ready, datastores ready, all ready to start the migration.
Added PBS storage to server 'minho', backed up the VMs successfully without any problem.
Added the PBS storage to server 'pve-wrk' in order to restore the VMs from PBS. The smaller ones restored ok, but the homeassistant VM restore (~200GB), starts ok, but after same random time, it stalls (no more chunk reads appear on the PBS task), and several minutes later, the process ends with an error.
There is nothing on journalctl on PVE and PBS that can point to the problem. Not a single error, nothing.
atop on PBS only shows that during the restore the ethernet port is at almost 100%, which is expected.
Example of a restore log with debug activated:
PVE task log:
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
Using encryption key from file descriptor..
Fingerprint: 34:ec:7c:6b:21:43:f8:97
new volume ID is 'local-zfs:vm-130-disk-0'
new volume ID is 'local-zfs:vm-130-disk-3'
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

connecting to repository 'pve-wrk@pbs@192.168.50.106

using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-0'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 100% (read 540672 bytes, zeroes = 0% (0 bytes), duration 0 sec)
restore image complete (bytes=540672, duration=0.00s, speed=218.84MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

connecting to repository 'pve-wrk@pbs@192.168.50.106

using up to 4 threads
open block backend for target '/dev/zvol/rpool/data/vm-130-disk-3'
starting to restore snapshot 'vm/130/2025-09-21T09:02:11Z'
download and verify backup index
fetching up to 16 chunks in parallel
progress 1% (read 2751463424 bytes, zeroes = 5% (159383552 bytes), duration 15 sec)
progress 2% (read 5498732544 bytes, zeroes = 3% (192937984 bytes), duration 31 sec)
progress 3% (read 8250195968 bytes, zeroes = 2% (209715200 bytes), duration 47 sec)
progress 4% (read 10997465088 bytes, zeroes = 2% (226492416 bytes), duration 65 sec)
restore failed: connection reset
temporary volume 'local-zfs:vm-130-disk-3' successfully removed
temporary volume 'local-zfs:vm-130-disk-0' successfully removed
error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
TASK ERROR: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

PVE journalctl:
Sep 22 22:34:59 minho kernel: Alternate GPT is invalid, using primary GPT.
Sep 22 22:34:59 minho kernel: zd96: p1 p2 p3 p4 p5 p6 p7 p8
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Sep 22 22:35:01 minho CRON[60346]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Sep 22 22:35:01 minho CRON[60344]: pam_unix(cron:session): session closed for user root
Sep 22 22:35:01 minho pvedaemon[51473]: error before or during data restore, some or all disks were not completely restored. VM 130 state is NOT cleaned up.
Sep 22 22:35:01 minho pvedaemon[51473]: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

Sep 22 22:35:01 minho pvedaemon[21725]: <root@pam> end task UPID:minho:0000C911:00088C4F:68D1BCDA:qmrestore:130:root@pam: command '/usr/bin/pbs-restore --repository pve-wrk@pbs@192.168.50.106

PBS task log with debug activated:
2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: DEBUG: send
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: GET /chunk
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/2ea6/2ea607d5a1ad3c4c50dfb9ded6366c26b35bb266cc4fdb48743916191529b4ef"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/5e10/5e10aa9f5988f129d66bee779ff8e8e77412627e02de9f4d5900d9c5af8484db"
2025-09-22T22:18:34+01:00: download chunk "/mnt/datastore/pbsha/.chunks/92f9/92f96c979c2c8772dd2a43aa4e9264d76faeca6d398810e98c608cac3f153f7a"
2025-09-22T22:18:34+01:00: DEBUG: received
2025-09-22T22:18:34+01:00: GET /chunk
It stalls at 22:18:34, and only after several minutes, pve fails with the "restore failed: connection reset".
Right now, I don't know where to look. Any ideas?
TLDR:
Restoring a backup from PBS to PVE fails with a "restore failed: connection reset" message. This error happens randomly during the restore process. I don’t know where to look to debug this.
Right now, I don't know where to look. Any ideas?