Backup vzdump stuck on pbs with 100% progress

Dec 10, 2022
46
2
13
Dear all,

After updating proxmox-backup:amd64 (from versions 3.2.0 to 3.3.0), my backups are stuck with vzdump progress at 100%, and proxmox-backup-proxy {tokio-runtime-w} is continuously reading all chunks and then, during the phase "INFO: Waiting for server to finish backup validation...", it cycles through the chunks again with:

"statx(AT_FDCWD, "/mnt/pool-1/.chunks/b0b2/b0b242b49ab4bbe54aa80fa5d5ee367560c3be19673d6f460e47f40818e04d89", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=4166618, ...}) = 0".

I had previously set the tuning parameter sync-level=none.

Does anyone have an idea of how to work around this problem? Has anyone else reported similar issues?

Here are the versions of vzdump and pbs:

pve-manager 8.3.0

proxmox-archive-keyring 3.1
proxmox-backup 3.3.0
proxmox-backup-client 3.3.0-1
proxmox-backup-docs 3.3.0-1
proxmox-backup-server 3.3.0-2
proxmox-default-kernel 1.1.0
proxmox-kernel-6.2 6.2.16-20
proxmox-kernel-6.2.16-20-pve 6.2.16-20
proxmox-kernel-6.5 6.5.13-6
proxmox-kernel-6.5.13-6-
pve-signed 6.5.13-6
proxmox-kernel-6.8 6.8.12-4
proxmox-kernel-6.8.12-2-pve-signed 6.8.12-2
proxmox-kernel-6.8.12-4-pve-signed 6.8.12-4
proxmox-kernel-6.8.8-2-pve-signed 6.8.8-2
proxmox-kernel-helper 8.1.0
proxmox-mail-forward 0.3.1
proxmox-mini-journalreader 1.4.0
proxmox-termproxy 1.1.0
proxmox-widget-toolkit 4.3.3
 
After updating proxmox-backup:amd64 (from versions 3.2.0 to 3.3.0), my backups are stuck with vzdump progress at 100%, and proxmox-backup-proxy {tokio-runtime-w} is continuously reading all chunks and then, during the phase "INFO: Waiting for server to finish backup validation...", it cycles through the chunks again with:

"statx(AT_FDCWD, "/mnt/pool-1/.chunks/b0b2/b0b242b49ab4bbe54aa80fa5d5ee367560c3be19673d6f460e47f40818e04d89", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=4166618, ...}) = 0".

I had previously set the tuning parameter sync-level=none.

Does anyone have an idea of how to work around this problem? Has anyone else reported similar issues?
Hi,
yes there was a change introduced with proxmox-backup in version 3.2.12-1 to stat known chunks which have not been re-uploaded by the client on backup finish in order to detect missing chunks, for details see [0]. It does not stat all of the chunks again as you implied.

A few questions:
  • If you say hang, how long does it take for the backup job to finish?
  • What storage are you using for your datastore? What is the reason for you choosing the sync-level none?
  • How big is the guest you are trying to backup?
  • Can you please provide the full backup task log for a backup of the same guest with 3.2 and with 3.3?

I suspect that your storage can simply not keep up with the additional I/O for the chunk existence check?

[0] https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=da11d22610108efa33ce9c61e0f849565c150d03

Edit: According to this thread you already suffered performance issues before as you are using an unsupported storage setup? Did you ever switch away from using mdadm RAID?
 
Last edited:
Hi,

We're still holding off on the mdadm installation (we're waiting for the new hardware to arrive).

As a premise, we've also updated all the PVE nodes without rebooting, and it seems that the machines that reboot and get the new KVM version resolve their problems (I'll confirm this with you). The VMs that are experiencing these issues are machines with between 1TB and 10TB of storage. The machines remain waiting at the end of the backup, depending on the size (hours) if we don't resolve the problem with stop/start and/or migration, I'll send you the log of the sessions before and after.

Thanks for the quick reply.
--
Best regards,
Luca
 
Hi,
yes there was a change introduced with proxmox-backup in version 3.2.12-1 to stat known chunks which have not been re-uploaded by the client on backup finish in order to detect missing chunks, for details see [0]. It does not stat all of the chunks again as you implied.

A few questions:
  • If you say hang, how long does it take for the backup job to finish?
  • What storage are you using for your datastore? What is the reason for you choosing the sync-level none?
  • How big is the guest you are trying to backup?
  • Can you please provide the full backup task log for a backup of the same guest with 3.2 and with 3.3?

I suspect that your storage can simply not keep up with the additional I/O for the chunk existence check?

[0] https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=da11d22610108efa33ce9c61e0f849565c150d03

Edit: According to this thread you already suffered performance issues before as you are using an unsupported storage setup? Did you ever switch away from using mdadm RAID?
Hi Chris,

II fixed the issue by rebooting or migrating VMs not yet using the new KVM. Previous problems were due to VMs affected by copy-on-write during backups, hence the Fleecing option. Thanks.

Thank you.
 
Hi,
yes there was a change introduced with proxmox-backup in version 3.2.12-1 to stat known chunks which have not been re-uploaded by the client on backup finish in order to detect missing chunks, for details see [0]. It does not stat all of the chunks again as you implied.

A few questions:
  • If you say hang, how long does it take for the backup job to finish?
  • What storage are you using for your datastore? What is the reason for you choosing the sync-level none?
  • How big is the guest you are trying to backup?
  • Can you please provide the full backup task log for a backup of the same guest with 3.2 and with 3.3?

I suspect that your storage can simply not keep up with the additional I/O for the chunk existence check?

[0] https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=da11d22610108efa33ce9c61e0f849565c150d03

Edit: According to this thread you already suffered performance issues before as you are using an unsupported storage setup? Did you ever switch away from using mdadm RAID?
Is there any way to make PVE not wait for verification after the backup? I'm asking this because apparently what's happening is that after making the backup, PVE is waiting for PBS to verify the backup or something like that.

In my case, the verify task has to be done after the backup time, because I have multiple VMs backed up and if I have to verify during this process, my backup window will be exceeded and the backup would still be running at 7 am.

I haven't noticed this in my environments yet, because I hadn't updated them, but I'm afraid that updating PVE and PBS will generate this problem and cause me to lose my backup windows, which aren't very large.
 
Is there any way to make PVE not wait for verification after the backup? I'm asking this because apparently what's happening is that after making the backup, PVE is waiting for PBS to verify the backup or something like that.

In my case, the verify task has to be done after the backup time, because I have multiple VMs backed up and if I have to verify during this process, my backup window will be exceeded and the backup would still be running at 7 am.

I haven't noticed this in my environments yet, because I hadn't updated them, but I'm afraid that updating PVE and PBS will generate this problem and cause me to lose my backup windows, which aren't very large.
Just to clarify, this is not a full chunk verification as performed by a verify job, only a check if known chunks still exists on backup finish. It is not possible to opt-in/opt-out. For further details and to be notified on updates regarding this please let me refer you to this thread: https://forum.proxmox.com/threads/p...g-after-concluding-uploads.158812/post-728575
 
Just to clarify, this is not a full chunk verification as performed by a verify job, only a check if known chunks still exists on backup finish. It is not possible to opt-in/opt-out. For further details and to be notified on updates regarding this please let me refer you to this thread: https://forum.proxmox.com/threads/p...g-after-concluding-uploads.158812/post-728575
Hi @Chris ,

Okay, no problem. In the meantime, I've noticed that I'm still having issues with spot: I've downgraded in anticipation of the new binaries being released.

Regarding the patch that undoes the changes made, can I refer to https://pbs.proxmox.com/wiki/index.php/Developer_Documentation to compile the binary? ... I'm new to Rust and don't know much about Cargo.

--
Best regards,
Luca
 
Regarding the patch that undoes the changes made, can I refer to https://pbs.proxmox.com/wiki/index.php/Developer_Documentation to compile the binary? ... I'm new to Rust and don't know much about Cargo.
In order to compile the binary, with the patch reverting the changes applied, I recommend to follow the Build section as described in the git repos readme [0]. That will make it way easier to pull in and set up all the required build dependencies.

[0] https://git.proxmox.com/?p=proxmox-...29a761867d98247097f4fdb3f44c00d58fe90;hb=HEAD
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!