Unexpected backup failures

vaschthestampede · May 27, 2026

Chris said:
How reproducible is the issue?

It's been happening every day lately.

Chris said:
You could try to check if using a local datastore not backed by iSCSI also produces the timeout errors.

I can't; the locally attached disks will be used for other backups.
I'll be able to check if those fail as well.

What I can tell you is that it's not the first time something like this has happened but on much less performing hardware and I blamed that.
And I didn't use iSCSI there, the drivers were local.

fiona · May 27, 2026

Hi @vaschthestampede,
do you have IO thread enabled for your VM disks? If not, it's highly recommended to do so. It's also highly recommended to not start backups from all nodes at the very same time, since PBS might get overloaded with handling the initial setup for each at the same time. Is the network for the storage and for PBS separate? It's also recommended to be.

vaschthestampede · May 27, 2026

fiona said:
do you have IO thread enabled for your VM disks?

Yes, It's my practice to have it turned on practically all the time.

fiona said:
Is the network for the storage and for PBS separate?

For PBS I have the network for backup and the one for iSCSI, two separate NICs.
For PVE I only have one ceph cluster and it has completely separate networks and NICs.

vaschthestampede · May 27, 2026

fiona said:
It's also highly recommended to not start backups from all nodes at the very same time, since PBS might get overloaded with handling the initial setup for each at the same time.

My requirement is to have all the backups at that time.
In fact, the goal is to have all the backups of all the VMs at the same time!

vaschthestampede · May 27, 2026

I have a new item to analyze, in the meantime there are backups that fail the metrics are not being sent.

Chris · May 28, 2026

vaschthestampede said:
It's been happening every day lately.

Since it is reproducible, please install gdb and the debug symbols on the PBS host via apt install gdb proxmox-backup-sever-dbgsym and when the hang appears the next time run gdb --batch --ex 't a a bt' -p $(pidof proxmox-backup-proxy) > proxy.backtrace. Ideally before the timeout on the PVE side. Then attach the backtrace here. That can tell us more about what is going on.

vaschthestampede · May 28, 2026

Chris said:
please install gdb and the debug symbols on the PBS host via apt install gdb proxmox-backup-sever-dbgsym

root@Elefante:~# apt install gdb proxmox-backup-sever-dbgsym
Error: Unable to locate package proxmox-backup-sever-dbgsym

Do i need to add any repositories?

janus57 · May 28, 2026

Hi,

there is a typo, try : proxmox-backup-server-dbgsym

Best regards,

vaschthestampede · May 28, 2026

I did what you should never do: I copied and pasted without even reading!
Thank you

fiona · May 28, 2026

vaschthestampede said:
My requirement is to have all the backups at that time.

Does the issue occur if you stagger the backups, i.e. start the jobs a few minutes delayed between the nodes? Having all backups start at the very same time might be the cause of your issue.

vaschthestampede · May 28, 2026

fiona said:
Does the issue occur if you stagger the backups, i.e. start the jobs a few minutes delayed between the nodes?

Can't do that.
My requirement is to have all the backups at that time.
In fact, the goal is to have all the backups of all the VMs at the same time.
Then the data can be written even later but the state of the VM to be backed up should be that of 6:30 pm, for each VM.

fiona said:
Having all backups start at the very same time might be the cause of your issue.

But what's the problem since the monitoring doesn't report anything under stress?
If there is no hardware limit (as it seems) then it is a software limit and this is a problem, a big one, am I wrong?

Chris · May 28, 2026

vaschthestampede said:
But what's the problem since the monitoring doesn't report anything under stress?

Did you also check your iscsi target? Regardless, I would be interested in a backtrace while there are no metric updates.

vaschthestampede · May 28, 2026

Chris said:
Did you also check your iscsi target?

Yes, that one also shows no problems and the iSCSI connection is stable.

Chris said:
would be interested in a backtrace while there are no metric updates.

Very interesting.
One piece of information that might be useful is that PBS receives backup data (and send metric) through a boundary between two interfaces, so I also did everything I could to avoid saturation issues.
The iSCSI has a dedicated port and the connection is direct; there's no switch in between.

fabian · May 28, 2026

vaschthestampede said:
Then the data can be written even later but the state of the VM to be backed up should be that of 6:30 pm, for each VM.

that won't be the case unless you have exactly one VM on each node..

vaschthestampede · May 28, 2026

You're right, but that's a problem for another day.

vaschthestampede · Jun 4, 2026

Chris said:
gdb --batch --ex 't a a bt' -p $(pidof proxmox-backup-proxy) > proxy.backtrace. Ideally before the timeout on the PVE side.

Here.

vaschthestampede · Jun 4, 2026

By the way, I'm noticing something very strange.
The command to create the file took over 3 hours.
In the meantime, all tasks were paused, but the number of failed backups yesterday is the same as the day before.
If it were a pure timeout issue, they should all have failed yesterday, am I wrong?

Chris · Jun 4, 2026

Thanks for the backtraces, at first glance they point at a massive lock contention on chunk insert to the local datastore cache. It seems that you are using your iSCSI target as local datastore cache for an S3 datastore, is that correct?

vaschthestampede · Jun 4, 2026

I'm using my iSCSI target as local datastore.
I'll explain what I did:

I installed open-iscsi.
I entered the credentials in the /etc/iscsi/iscsid.conf file.
I discovered the iSCSI target's IP address.
I logged in to the target and made it automatic upon reboot.
I created an LVM volume on the target.
I created an xfs partition on the LVM volume.
I created the /mnt/iscsi folder and mounted the xfs partition there.
I created the datastore linked to the /mnt/iscsi folder.

Step 5 was necessary to make the metrics appear on the datastore page.

Chris · Jun 4, 2026

Do you have also another, unrelated datastore which is backed by S3 running on this instance? I'm asking since your backtrace includes proxmox-s3-client code paths and there was a recent bugfix for hanging proxy's [1], so this might be related (not packaged yet at the time of writing).

[1] https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=23400016322c7a6981f111558e8d22666e32ee8c

Unexpected backup failures

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Renowned Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Attachments

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

We value your privacy