We have been testing Proxmox Backup Server, because it's got a lot of really strong features. In principle, I really like it.
In practice, it's causing some pretty serious problems.
On VMs with large disks, the backups are positively glacial.
Example:
What's worse, the backups seem to block guest IO intermittently for periods of up to several minutes at a time, rendering the guest VM completely unusable.
This is on an NVMe Ceph filestore that performs very well, so I don't think that's the issue. Proxmox is 8.0.3 on all nodes.
Is there anything obvious that might be wrong here? I'm reasonably sure it's not supposed to be like this.
Thanks for any advice!
In practice, it's causing some pretty serious problems.
On VMs with large disks, the backups are positively glacial.
Example:
Code:
INFO: Starting Backup of VM 1914 (qemu)
INFO: Backup started at 2023-08-29 01:34:47
INFO: status = running
INFO: VM Name: fs14
INFO: include disk 'scsi0' 'rbd-nyc1:vm-1914-disk-0' 16G
INFO: include disk 'scsi1' 'rbd-nyc1:vm-1914-disk-1' 1T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1914/2023-08-29T01:34:47Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '5af03282-3c7c-49e2-9883-c31b274c7c49'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (228.0 MiB of 16.0 GiB dirty)
INFO: scsi1: dirty-bitmap status: OK (45.5 GiB of 1.0 TiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 45.7 GiB dirty of 1.0 TiB total
INFO: 0% (108.0 MiB of 45.7 GiB) in 3s, read: 36.0 MiB/s, write: 36.0 MiB/s
INFO: 1% (472.0 MiB of 45.7 GiB) in 5m 35s, read: 1.1 MiB/s, write: 1.1 MiB/s
INFO: 2% (948.0 MiB of 45.7 GiB) in 9m 49s, read: 1.9 MiB/s, write: 1.9 MiB/s
INFO: 3% (1.4 GiB of 45.7 GiB) in 12m 55s, read: 2.5 MiB/s, write: 2.5 MiB/s
INFO: 4% (1.8 GiB of 45.7 GiB) in 18m 4s, read: 1.5 MiB/s, write: 1.5 MiB/s
INFO: 5% (2.3 GiB of 45.7 GiB) in 26m 22s, read: 970.5 KiB/s, write: 970.5 KiB/s
INFO: 6% (2.7 GiB of 45.7 GiB) in 33m 56s, read: 1.0 MiB/s, write: 1.0 MiB/s
INFO: 7% (3.2 GiB of 45.7 GiB) in 36m 38s, read: 2.9 MiB/s, write: 2.9 MiB/s
INFO: 8% (3.7 GiB of 45.7 GiB) in 40m 8s, read: 2.2 MiB/s, write: 2.2 MiB/s
INFO: 9% (4.1 GiB of 45.7 GiB) in 45m 52s, read: 1.4 MiB/s, write: 1.4 MiB/s
INFO: 10% (4.7 GiB of 45.7 GiB) in 47m 19s, read: 6.4 MiB/s, write: 6.4 MiB/s
INFO: 11% (5.0 GiB of 45.7 GiB) in 51m 46s, read: 1.3 MiB/s, write: 1.3 MiB/s
INFO: 12% (5.5 GiB of 45.7 GiB) in 59m 14s, read: 1.0 MiB/s, write: 1.0 MiB/s
INFO: 13% (5.9 GiB of 45.7 GiB) in 1h 7m, read: 1.0 MiB/s, write: 1.0 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 1914 failed - interrupted by signal
INFO: Failed at 2023-08-29 02:47:01
ERROR: Backup job failed - interrupted by signal
TASK ERROR: interrupted by signal
What's worse, the backups seem to block guest IO intermittently for periods of up to several minutes at a time, rendering the guest VM completely unusable.
This is on an NVMe Ceph filestore that performs very well, so I don't think that's the issue. Proxmox is 8.0.3 on all nodes.
Is there anything obvious that might be wrong here? I'm reasonably sure it's not supposed to be like this.
Thanks for any advice!