Some backups take too long

Skyfay

Member
Oct 8, 2023
87
12
8
Hello everyone.
I have been using PBS for a short time now to use less backup storage and to reduce the backup time/bandwith as not all data has to be copied.
But now I've noticed that certain VMs take an extremely long time and I don't understand why.

The servers are connected via WAN with a VPN connection which reached over 1 Gbps during tests.

Perhaps you will notice something in the report:
VMIDNameStatusTimeSizeFilename
201Firewallok29s45 GiBvm/201/2024-10-20T00:30:04Z
230m312-1ok31s32.004 GiBvm/230/2024-10-20T00:30:33Z
901Firewallok13min 39s50 GiBvm/901/2024-10-20T00:31:04Z
920Web-Serverok1h 30min 59s20.001 GiBvm/920/2024-10-20T00:44:43Z
925SkyBlogok8s40.001 GiBvm/925/2024-10-20T02:15:43Z
926SkyDocsok11s80.001 GiBvm/926/2024-10-20T02:15:52Z
927Websitesok9s60.001 GiBvm/927/2024-10-20T02:16:04Z
928Share-Send-Pasteok15s35.001 GiBvm/928/2024-10-20T02:16:14Z
936Plexok2h 47min 40s230.001 GiBvm/936/2024-10-20T02:16:30Z
948Transferok16s40.001 GiBvm/948/2024-10-20T05:04:10Z
950TrueNasok6min 23s55 GiBvm/950/2024-10-20T05:04:26Z
Total running time: 4h 40min 45s
Total size: 687.008 GiB
Logs:

https://skypaste.ch/?15cbb06c754683c2#AMYwY5U8oVvUknyUuubcHp5cQ8t51QX9e46thxaXJ4nR


Another one:
VMIDNameStatusTimeSizeFilename
201Firewallok42s45 GiBvm/201/2024-10-21T00:30:03Z
230m312-1ok57s32.004 GiBvm/230/2024-10-21T00:30:45Z
901Firewallok9min 28s50 GiBvm/901/2024-10-21T00:31:42Z
920Web-Serverok22min 5s20.001 GiBvm/920/2024-10-21T00:41:11Z
925SkyBlogok15s40.001 GiBvm/925/2024-10-21T01:03:16Z
926SkyDocsok15s80.001 GiBvm/926/2024-10-21T01:03:31Z
927Websitesok20s60.001 GiBvm/927/2024-10-21T01:03:47Z
928Share-Send-Pasteok20s35.001 GiBvm/928/2024-10-21T01:04:07Z
936Plexok5h 15min 29s230.001 GiBvm/936/2024-10-21T01:04:28Z
948Transferok25s40.001 GiBvm/948/2024-10-21T06:19:57Z
950TrueNasok10min 14s55 GiBvm/950/2024-10-21T06:20:22Z
Total running time: 6h 34s
Total size: 687.008 GiB
Logs:

https://skypaste.ch/?ea989874571948e7#CzdiyN5qAZupavaiHDzvXryKhHHzGTUGQNvDt5LAmUyA
 
Last edited:
Another guess would be the network. You mentioned a transfer speed of 1Gbps which normally should allow much higher transfer speeds, but I'm wondering whether your tests were for download only. Most WAN connections have way better download than upload speeds.
Can you please give the specs of both side (CPU, RAM, network card etc?)?
 
The backup server has a 2.5 Gbps download and the VE server has a 1 Gbps upload. When I ran the first backups, I had a constant transfer rate of 1 Gbps.
 
Did you have any data and name of the devices?
Backup Server (VM on a PVE Host):
Intel Core i7-10700
2x 32GB Corsair Vengeance LPX DDR4
AsRock H570M-ITX/ac
Intel SSD (Intel) P4510 8 TB U.2 NVMe PCIe 3.1

PVE Host:
AMD Ryzen 9 3900
4x 32GB Samsung ECC Ram
Asus Pro WS 565-ACE
SAMSUNG SSD MZQL21T9HCJR-00A07

I also do the whole thing on the other side. On the main server “PVE Host” Truenas is installed with 8 HDD's and in Truenas runs a VM with PBS (on hard disks) from where the VM's are backed up from the backup server. And I don't have the problem there either.
I only have a 100 mbps upload on my backup server, but the backup only takes 15 minutes for 10 VMs.
 
Ok, so you didn't enable the verify of new Backups directory after the sync/backup.
I'm trying to reduce variables which might play a part. Did you try to enable backup fleecing in the vm options?
 
No because i scheduled a verify every day. I thought that would be also good.

It is :) I just wanted to verify (pun intended), that the additional time of the direct verify after the backup leads to a longer duration.

I use default settings there:
View attachment 76735

Ok, although I still don't understand why your backups take longer I would try to enable it on one of the problematic vms. I'm not sure whether it will even help (because the goal of this feature is to help the vms performance during an backup job, not the backup itself ) but as I said: I want to reduce variables. You will need some space on the storage, see:
https://pve.proxmox.com/pve-docs/chapter-vzdump.html#_vm_backup_fleecing

One thing I'm wondering about it that at the beginning your backups were faster than now. If you would use an hdd, than we would have an answer: More backups->More chunks->More small files to check, which isn't fun on an HDD. But with an enterprise SSD this shouldn't be such a big issue. Could you try to setup another data store on your target storage in a different directory (so existing chunks won't be reduced) and do a test backup to it? You can remove it after the test to save space. I'm curious whether the performance will be better or not.
 
Last edited:
  • Like
Reactions: UdoB
So I created a new datastore and did a fresh backup. (One VM could not be backed up because it was somehow still trapped in a backup process), but I think you can see that the transfer rate is good.
VMIDNameStatusTimeSizeFilename
201Firewallok52s45 GiBvm/201/2024-10-25T07:10:31Z
230m312-1ok41s32.004 GiBvm/230/2024-10-25T07:11:23Z
901Firewallok3min 49s50 GiBvm/901/2024-10-25T07:12:04Z
920Web-Serverok1min 55s20.001 GiBvm/920/2024-10-25T07:15:53Z
925SkyBlogok34s40.001 GiBvm/925/2024-10-25T07:17:49Z
926SkyDocsok39s80.001 GiBvm/926/2024-10-25T07:18:23Z
927Websitesok2min 47s60.001 GiBvm/927/2024-10-25T07:19:02Z
928Share-Send-Pasteok34s35.001 GiBvm/928/2024-10-25T07:21:50Z
936VM 936err<0.1s0 Bnull
948Transferok1min 19s40.001 GiBvm/948/2024-10-25T07:22:25Z
950TrueNasok3min 28s55 GiBvm/950/2024-10-25T07:23:44Z
Total running time: 16min 42s
Total size: 457.007 GiB
Logs: https://skypaste.ch/?591c2207aebc89a1#3oBWNpRYk9Bh7xKmGYsjQbuk49htaNy8zC3QxfjE8Qzd



The backup tonight was a little better, but still needs to much time. I mean look at the vm 920 with the smallest disk size...

VMIDNameStatusTimeSizeFilename
201Firewallok23s45 GiBvm/201/2024-10-25T00:30:06Z
230m312-1ok26s32.004 GiBvm/230/2024-10-25T00:30:29Z
901Firewallok6min 34s50 GiBvm/901/2024-10-25T00:30:55Z
920Web-Serverok29min 19s20.001 GiBvm/920/2024-10-25T00:37:29Z
925SkyBlogok5s40.001 GiBvm/925/2024-10-25T01:06:48Z
926SkyDocsok6s80.001 GiBvm/926/2024-10-25T01:06:53Z
927Websitesok4s60.001 GiBvm/927/2024-10-25T01:06:59Z
928Share-Send-Pasteok8s35.001 GiBvm/928/2024-10-25T01:07:03Z
936Plexok7min 39s230.001 GiBvm/936/2024-10-25T01:07:11Z
948Transferok16s40.001 GiBvm/948/2024-10-25T01:14:50Z
950TrueNasok19s55 GiBvm/950/2024-10-25T01:15:07Z
Total running time: 45min 20s
Total size: 687.008 GiB
 
I did not read the whole thread closely so sorry if it's already stated, and I just overlooked it, but is your PBS particularly busy around 00:30? I.e., is GC is running there? As by default that's started at 00:00, depending on size and other datastores it might cause some IO pressure that makes the backups slower around that time.

If then I'd recommend to spread out GCs between different datastores, if on the same backing storage, and/or moving the schedule of the backup job itself to a bit later, or before 00:00. Just check the past GC jobs to see how long the run, that should give you an estimate of when the PBS system will be loaded more.
 
  • Like
Reactions: Johannes S
I did not read the whole thread closely so sorry if it's already stated, and I just overlooked it, but is your PBS particularly busy around 00:30? I.e., is GC is running there? As by default that's started at 00:00, depending on size and other datastores it might cause some IO pressure that makes the backups slower around that time.
1729842296375.png
My backups run at 02:30 am.


I also make backups in the other direction, and there are no problems although there is only a 100 mbits upload, it is faster.

VMIDNameStatusTimeSizeFilename
220Web-Serverok7min 43s20 GiBvm/220/2024-10-24T00:30:02Z
224Password-Serverok41s15 GiBvm/224/2024-10-24T00:37:45Z
225Media-Serverok5min 43s225.001 GiBvm/225/2024-10-24T00:38:26Z
226SkySendok10s15 GiBvm/226/2024-10-24T00:44:09Z
227Message-Serverok13s40 GiBvm/227/2024-10-24T00:44:20Z
229Dashboard-Serverok9s15 GiBvm/229/2024-10-24T00:44:33Z
245Wazuhok36s60.001 GiBvm/245/2024-10-24T00:44:42Z
246DNS-DHCPok9s12.001 GiBvm/246/2024-10-24T00:45:18Z
247Tailscaleok5s5 GiBvm/247/2024-10-24T00:45:28Z
248Monitoring ok37s40 GiBvm/248/2024-10-24T00:45:33Z
249Administrationok10s20 GiBvm/249/2024-10-24T00:46:10Z
260Proxmox-Backup-Serverok16s40.001 GiBvm/260/2024-10-24T00:46:20Z
Total running time: 16min 34s
Total size: 507.002 GiB
 
Last edited:
  • Like
Reactions: Johannes S
My backups run at 02:30 am.
Ah yeah, my bad, I was getting the 00:30 from the logs, but that's UTC not what you/your server's TZ is in.

A rough check could be to look at the pressure stall info during the backups on both PVE and PBS to see if one of the big possible bottlenecks (IO, CPU or Memory) are (over)loaded. E.g. use:

head /proc/pressure/*

This shows how many process time was spent on waiting on cpu/io/memory over last 10/60/300 seconds by some or all (=full) processes that could actually run. In the best case those values are zero or close to that.
 
Is there any hardware recommendation. The PBS VM's got 4 vCPU and 2GB of Ram. I think that should be enough?
 
This night the backups went as planned. Without me changing anything.
VMIDNameStatusTimeSizeFilename
201Firewallok22s45 GiBvm/201/2024-10-26T00:30:03Z
230m312-1ok16s32.004 GiBvm/230/2024-10-26T00:30:25Z
901Firewallok38s50 GiBvm/901/2024-10-26T00:30:41Z
920Web-Serverok1min 49s20.001 GiBvm/920/2024-10-26T00:31:19Z
925SkyBlogok17s40.001 GiBvm/925/2024-10-26T00:33:08Z
926SkyDocsok23s80.001 GiBvm/926/2024-10-26T00:33:25Z
927Websitesok40s60.001 GiBvm/927/2024-10-26T00:33:48Z
928Share-Send-Pasteok14s35.001 GiBvm/928/2024-10-26T00:34:29Z
936Plexok5min 59s230.001 GiBvm/936/2024-10-26T00:34:43Z
948Transferok20s40.001 GiBvm/948/2024-10-26T00:40:42Z
950TrueNasok42s55 GiBvm/950/2024-10-26T00:41:02Z
Total running time: 11min 41s
Total size: 687.008 GiB
 
  • Like
Reactions: Johannes S
I don't know why, but the last backups all ran without any major problems.
VMIDNameStatusTimeSizeFilename
201Firewallok10s45 GiBvm/201/2024-10-29T01:30:05Z
230m312-1ok9s32.004 GiBvm/230/2024-10-29T01:30:15Z
901Firewallok15s50 GiBvm/901/2024-10-29T01:30:24Z
920Web-Serverok1min 44s20.001 GiBvm/920/2024-10-29T01:30:39Z
925SkyBlogok8s40.001 GiBvm/925/2024-10-29T01:32:23Z
926SkyDocsok6s80.001 GiBvm/926/2024-10-29T01:32:31Z
927Websitesok4s60.001 GiBvm/927/2024-10-29T01:32:37Z
928Share-Send-Pasteok12s35.001 GiBvm/928/2024-10-29T01:32:41Z
936Plexok1min 46s230.001 GiBvm/936/2024-10-29T01:32:53Z
948Transferok8s40.001 GiBvm/948/2024-10-29T01:34:39Z
950TrueNasok14s55 GiBvm/950/2024-10-29T01:34:47Z
Total running time: 4min 56s
Total size: 687.008 GiB



VMIDNameStatusTimeSizeFilename
201Firewallok10s45 GiBvm/201/2024-10-28T01:30:03Z
230m312-1ok5s32.004 GiBvm/230/2024-10-28T01:30:13Z
901Firewallok13s50 GiBvm/901/2024-10-28T01:30:19Z
920Web-Serverok1min20.001 GiBvm/920/2024-10-28T01:30:32Z
925SkyBlogok8s40.001 GiBvm/925/2024-10-28T01:31:32Z
926SkyDocsok9s80.001 GiBvm/926/2024-10-28T01:31:40Z
927Websitesok9s60.001 GiBvm/927/2024-10-28T01:31:49Z
928Share-Send-Pasteok11s35.001 GiBvm/928/2024-10-28T01:31:58Z
936Plexok2min 53s230.001 GiBvm/936/2024-10-28T01:32:09Z
948Transferok10s40.001 GiBvm/948/2024-10-28T01:35:02Z
950TrueNasok16s55 GiBvm/950/2024-10-28T01:35:12Z
Total running time: 5min 26s
Total size: 687.008 GiB
 
  • Like
Reactions: Johannes S