PBS incredibly slow, guests hang

jdw · Aug 29, 2023

We have been testing Proxmox Backup Server, because it's got a lot of really strong features. In principle, I really like it.

In practice, it's causing some pretty serious problems.

On VMs with large disks, the backups are positively glacial.

Example:

Code:

INFO: Starting Backup of VM 1914 (qemu)
INFO: Backup started at 2023-08-29 01:34:47
INFO: status = running
INFO: VM Name: fs14
INFO: include disk 'scsi0' 'rbd-nyc1:vm-1914-disk-0' 16G
INFO: include disk 'scsi1' 'rbd-nyc1:vm-1914-disk-1' 1T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1914/2023-08-29T01:34:47Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '5af03282-3c7c-49e2-9883-c31b274c7c49'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (228.0 MiB of 16.0 GiB dirty)
INFO: scsi1: dirty-bitmap status: OK (45.5 GiB of 1.0 TiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 45.7 GiB dirty of 1.0 TiB total
INFO:   0% (108.0 MiB of 45.7 GiB) in 3s, read: 36.0 MiB/s, write: 36.0 MiB/s
INFO:   1% (472.0 MiB of 45.7 GiB) in 5m 35s, read: 1.1 MiB/s, write: 1.1 MiB/s
INFO:   2% (948.0 MiB of 45.7 GiB) in 9m 49s, read: 1.9 MiB/s, write: 1.9 MiB/s
INFO:   3% (1.4 GiB of 45.7 GiB) in 12m 55s, read: 2.5 MiB/s, write: 2.5 MiB/s
INFO:   4% (1.8 GiB of 45.7 GiB) in 18m 4s, read: 1.5 MiB/s, write: 1.5 MiB/s
INFO:   5% (2.3 GiB of 45.7 GiB) in 26m 22s, read: 970.5 KiB/s, write: 970.5 KiB/s
INFO:   6% (2.7 GiB of 45.7 GiB) in 33m 56s, read: 1.0 MiB/s, write: 1.0 MiB/s
INFO:   7% (3.2 GiB of 45.7 GiB) in 36m 38s, read: 2.9 MiB/s, write: 2.9 MiB/s
INFO:   8% (3.7 GiB of 45.7 GiB) in 40m 8s, read: 2.2 MiB/s, write: 2.2 MiB/s
INFO:   9% (4.1 GiB of 45.7 GiB) in 45m 52s, read: 1.4 MiB/s, write: 1.4 MiB/s
INFO:  10% (4.7 GiB of 45.7 GiB) in 47m 19s, read: 6.4 MiB/s, write: 6.4 MiB/s
INFO:  11% (5.0 GiB of 45.7 GiB) in 51m 46s, read: 1.3 MiB/s, write: 1.3 MiB/s
INFO:  12% (5.5 GiB of 45.7 GiB) in 59m 14s, read: 1.0 MiB/s, write: 1.0 MiB/s
INFO:  13% (5.9 GiB of 45.7 GiB) in 1h 7m, read: 1.0 MiB/s, write: 1.0 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 1914 failed - interrupted by signal
INFO: Failed at 2023-08-29 02:47:01
ERROR: Backup job failed - interrupted by signal
TASK ERROR: interrupted by signal

What's worse, the backups seem to block guest IO intermittently for periods of up to several minutes at a time, rendering the guest VM completely unusable.

This is on an NVMe Ceph filestore that performs very well, so I don't think that's the issue. Proxmox is 8.0.3 on all nodes.

Is there anything obvious that might be wrong here? I'm reasonably sure it's not supposed to be like this.

Thanks for any advice!

scyto · Aug 29, 2023

where are you running PBS? Consider running it on another machine not a vm or lxc in your cluster.

What are you using as your data store? don’t use a path on your ceph or disks used for vm data.

jdw · Aug 29, 2023

PBS runs on a dedicated 16-core server with 256 GiB of RAM and a directly attached ZFS storage array.

It isn't PBS that's being slow; it has plenty of juice. Whatever the problem is, it appears to be on the Proxmox VE side.

scyto · Aug 29, 2023

Yeah that is super slow and your pbs is way faster than my Synology vm and I am much much faster.

I have seen some folks saying that have slow issues on pv8 and issues with relatek cards - not sure i saw definitive answers. What happens if you backup to a basic volume locally (aka can you eliminate the network) heck do a pbs on the cluster in a vm and see what happens - might help you isolate the cause….

jdw · Aug 29, 2023

The Proxmox VE machines don't have enough non-ceph storage in them to do a 1TB local backup. But there's no Realtek here and, as far as I know, the network is working fine:

Code:

Connecting to host 192.168.19.56, port 5201
[  5] local 192.168.19.54 port 36446 connected to 192.168.19.56 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.20 GBytes  18.9 Gbits/sec  436   2.01 MBytes       
[  5]   1.00-2.00   sec  2.17 GBytes  18.6 Gbits/sec    0   2.01 MBytes       
[  5]   2.00-3.00   sec  2.16 GBytes  18.6 Gbits/sec    0   2.01 MBytes       
[  5]   3.00-4.00   sec  2.20 GBytes  18.9 Gbits/sec    0   2.01 MBytes       
[  5]   4.00-5.00   sec  2.23 GBytes  19.1 Gbits/sec    0   2.01 MBytes       
[  5]   5.00-6.00   sec  2.23 GBytes  19.2 Gbits/sec   51   1.01 MBytes       
[  5]   6.00-7.00   sec  2.17 GBytes  18.7 Gbits/sec    0   1.01 MBytes       
[  5]   7.00-8.00   sec  2.21 GBytes  19.0 Gbits/sec    0   1.01 MBytes       
[  5]   8.00-9.00   sec  2.18 GBytes  18.8 Gbits/sec    0   1.01 MBytes       
[  5]   9.00-10.00  sec  2.18 GBytes  18.7 Gbits/sec    0   1.01 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  21.9 GBytes  18.8 Gbits/sec  487             sender
[  5]   0.00-10.00  sec  21.9 GBytes  18.8 Gbits/sec                  receiver

scyto · Aug 29, 2023

Your slow job failed at 45 gb can you create a 128gb vm and try that a)remote and see if you have issue and then if you do b) do it locally - at least then you will know if it’s the local backup process that’s the issue or not.

Chris · Aug 29, 2023

Hi,
according to the backup task logs:

jdw said:
INFO: 0% (108.0 MiB of 45.7 GiB) in 3s, read: 36.0 MiB/s, write: 36.0 MiB/s INFO: 1% (472.0 MiB of 45.7 GiB) in 5m 35s, read: 1.1 MiB/s, write: 1.1 MiB/s INFO: 2% (948.0 MiB of 45.7 GiB) in 9m 49s, read: 1.9 MiB/s, write: 1.9 MiB/s INFO: 3% (1.4 GiB of 45.7 GiB) in 12m 55s, read: 2.5 MiB/s, write: 2.5 MiB/s INFO: 4% (1.8 GiB of 45.7 GiB) in 18m 4s, read: 1.5 MiB/s, write: 1.5 MiB/s INFO: 5% (2.3 GiB of 45.7 GiB) in 26m 22s, read: 970.5 KiB/s, write: 970.5 KiB/s INFO: 6% (2.7 GiB of 45.7 GiB) in 33m 56s, read: 1.0 MiB/s, write: 1.0 MiB/s INFO: 7% (3.2 GiB of 45.7 GiB) in 36m 38s, read: 2.9 MiB/s, write: 2.9 MiB/s INFO: 8% (3.7 GiB of 45.7 GiB) in 40m 8s, read: 2.2 MiB/s, write: 2.2 MiB/s INFO: 9% (4.1 GiB of 45.7 GiB) in 45m 52s, read: 1.4 MiB/s, write: 1.4 MiB/s INFO: 10% (4.7 GiB of 45.7 GiB) in 47m 19s, read: 6.4 MiB/s, write: 6.4 MiB/s INFO: 11% (5.0 GiB of 45.7 GiB) in 51m 46s, read: 1.3 MiB/s, write: 1.3 MiB/s INFO: 12% (5.5 GiB of 45.7 GiB) in 59m 14s, read: 1.0 MiB/s, write: 1.0 MiB/s INFO: 13% (5.9 GiB of 45.7 GiB) in 1h 7m, read: 1.0 MiB/s, write: 1.0 MiB/s

So as you can see, you are limited by reading from the source, not by the upload to the Proxmox Backup Server. Check your Ceph health and configuration and test your storage speed.

The slow storage performance might also explain why the guest hangs during backup, the `fs-thaw` has already been performed, so the guest filesystem is unfrozen.

Also, please post the VM config qm config <VMID> and storage config cat /etc/pve/storage.cfg.

Edit: Also check your Proxmox VE host performance with the proxmox-backup-client benchmark, this might give some further clues where the bottleneck is.

jdw · Aug 29, 2023

Chris said:
So as you can see, you are limited by reading from the source, not by the upload to the Proxmox Backup Server.

This is absolutely not the case. The storage performance only becomes poor on these specific VMs, only while the backup is running. The storage, which is all U.2 NVMe drives hyperconverged on the Proxmox VE nodes, is very fast.

Chris said:
Check your Ceph health and configuration and test your storage speed.

The cluster is healthy, and performs quite well.

4k random read performance (rados bench w/full workload running as usual and five backups ):

Code:

Total time run:       30.0002
Total reads made:     1948644
Read size:            4096
Object size:          4096
Bandwidth (MB/sec):   253.728
Average IOPS:         64954
Stddev IOPS:          1313.98
Max IOPS:             67512
Min IOPS:             62544
Average Latency(s):   0.00024351
Max latency(s):       0.0150301
Min latency(s):       6.139e-05

4M random read results (same conditions):

Code:

Total time run:       30.0228
Total reads made:     25518
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   3399.81
Average IOPS:         849
Stddev IOPS:          22.6193
Max IOPS:             906
Min IOPS:             815
Average Latency(s):   0.0185685
Max latency(s):       0.243001
Min latency(s):       0.00308846

Now, four out of five of those backups were f'd and so not doing very much. Here's what ceph I/O looks like:

Code:

    client:   58 MiB/s rd, 8.7 MiB/s wr, 18 op/s rd, 493 op/s wr
    client:   48 MiB/s rd, 7.6 MiB/s wr, 15 op/s rd, 432 op/s wr
    client:   55 MiB/s rd, 9.9 MiB/s wr, 18 op/s rd, 553 op/s wr
    client:   35 MiB/s rd, 7.0 MiB/s wr, 11 op/s rd, 393 op/s wr
    client:   35 MiB/s rd, 7.0 MiB/s wr, 11 op/s rd, 393 op/s wr
    client:   43 MiB/s rd, 8.1 MiB/s wr, 15 op/s rd, 448 op/s wr
    client:   43 MiB/s rd, 8.1 MiB/s wr, 15 op/s rd, 448 op/s wr
    client:   44 MiB/s rd, 7.9 MiB/s wr, 15 op/s rd, 444 op/s wr
    client:   44 MiB/s rd, 7.9 MiB/s wr, 15 op/s rd, 444 op/s wr
    client:   39 MiB/s rd, 6.8 MiB/s wr, 14 op/s rd, 384 op/s wr
    client:   47 MiB/s rd, 9.2 MiB/s wr, 17 op/s rd, 528 op/s wr
    client:   47 MiB/s rd, 9.2 MiB/s wr, 17 op/s rd, 528 op/s wr
    client:   41 MiB/s rd, 7.0 MiB/s wr, 13 op/s rd, 397 op/s wr
    client:   49 MiB/s rd, 8.3 MiB/s wr, 16 op/s rd, 467 op/s wr
    client:   49 MiB/s rd, 8.3 MiB/s wr, 16 op/s rd, 467 op/s wr
    client:   45 MiB/s rd, 7.9 MiB/s wr, 14 op/s rd, 452 op/s wr
    client:   45 MiB/s rd, 7.9 MiB/s wr, 14 op/s rd, 452 op/s wr
    client:   36 MiB/s rd, 7.1 MiB/s wr, 11 op/s rd, 401 op/s wr
    client:   36 MiB/s rd, 7.1 MiB/s wr, 11 op/s rd, 401 op/s wr

Compared to how it looks while rados bench is running for 4k reads:

Code:

    client:   225 MiB/s rd, 7.7 MiB/s wr, 52.79k op/s rd, 440 op/s wr
    client:   295 MiB/s rd, 9.1 MiB/s wr, 69.74k op/s rd, 521 op/s wr
    client:   295 MiB/s rd, 9.1 MiB/s wr, 69.74k op/s rd, 521 op/s wr
    client:   259 MiB/s rd, 8.6 MiB/s wr, 60.47k op/s rd, 468 op/s wr
    client:   259 MiB/s rd, 8.6 MiB/s wr, 60.47k op/s rd, 468 op/s wr
    client:   247 MiB/s rd, 7.2 MiB/s wr, 58.03k op/s rd, 390 op/s wr
    client:   324 MiB/s rd, 9.8 MiB/s wr, 76.41k op/s rd, 534 op/s wr
    client:   324 MiB/s rd, 9.8 MiB/s wr, 76.41k op/s rd, 534 op/s wr
    client:   244 MiB/s rd, 6.8 MiB/s wr, 58.16k op/s rd, 376 op/s wr
    client:   244 MiB/s rd, 6.8 MiB/s wr, 58.16k op/s rd, 376 op/s wr
    client:   308 MiB/s rd, 8.5 MiB/s wr, 73.50k op/s rd, 470 op/s wr
    client:   262 MiB/s rd, 8.0 MiB/s wr, 61.83k op/s rd, 437 op/s wr
    client:   262 MiB/s rd, 8.0 MiB/s wr, 61.83k op/s rd, 437 op/s wr
    client:   246 MiB/s rd, 6.7 MiB/s wr, 58.36k op/s rd, 372 op/s wr
    client:   246 MiB/s rd, 6.7 MiB/s wr, 58.36k op/s rd, 372 op/s wr
    client:   342 MiB/s rd, 9.3 MiB/s wr, 76.35k op/s rd, 523 op/s wr

and 4M random reads:

Code:

    client:   3.1 GiB/s rd, 8.1 MiB/s wr, 807 op/s rd, 461 op/s wr
    client:   3.1 GiB/s rd, 8.1 MiB/s wr, 807 op/s rd, 461 op/s wr
    client:   3.3 GiB/s rd, 7.6 MiB/s wr, 853 op/s rd, 439 op/s wr
    client:   3.3 GiB/s rd, 7.6 MiB/s wr, 853 op/s rd, 439 op/s wr
    client:   3.6 GiB/s rd, 9.4 MiB/s wr, 922 op/s rd, 534 op/s wr
    client:   2.9 GiB/s rd, 7.6 MiB/s wr, 758 op/s rd, 443 op/s wr
    client:   2.9 GiB/s rd, 7.6 MiB/s wr, 758 op/s rd, 443 op/s wr
    client:   3.8 GiB/s rd, 9.3 MiB/s wr, 966 op/s rd, 539 op/s wr
    client:   3.1 GiB/s rd, 8.6 MiB/s wr, 793 op/s rd, 497 op/s wr
    client:   3.1 GiB/s rd, 8.6 MiB/s wr, 793 op/s rd, 497 op/s wr
    client:   3.3 GiB/s rd, 7.9 MiB/s wr, 846 op/s rd, 465 op/s wr
    client:   3.3 GiB/s rd, 7.9 MiB/s wr, 846 op/s rd, 465 op/s wr
    client:   3.6 GiB/s rd, 9.7 MiB/s wr, 914 op/s rd, 570 op/s wr
    client:   2.9 GiB/s rd, 7.7 MiB/s wr, 743 op/s rd, 461 op/s wr
    client:   2.9 GiB/s rd, 7.7 MiB/s wr, 743 op/s rd, 461 op/s wr
    client:   3.0 GiB/s rd, 9.3 MiB/s wr, 770 op/s rd, 547 op/s wr

The example I initially posted is a MariaDB server. It replicates another MariaDB server. That's all it does. When I aborted the backup at 67 minutes, it was 58 minutes behind.

After I stopped the backup, it caught up (from 2,000 miles away) in 4 minutes.

IMO, none of this tends to implicate the storage pool.

Chris said:
The slow storage performance might also explain why the guest hangs during backup, the `fs-thaw` has already been performed, so the guest filesystem is unfrozen.

iostat from inside a guest being backed up:

Code:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

0 reads/sec. 0 writes/sec. 100% utilization. If you say the guest filesystem is unfrozen, I believe you (the backup log also says so), but it sure acts frozen.

That happens intermittently throughout the backup for 10 seconds to 10 minutes at a time.

Chris said:
Also, please post the VM config qm config <VMID>

For the iostat host above:

Code:

agent: 1
boot: order=scsi0
cores: 4
cpu: EPYC-IBPB
memory: 32768
meta: creation-qemu=8.0.2,ctime=1692983416
name: fs11
net0: virtio=CA:6A:34:44:57:EE,bridge=vmbr1,firewall=1,tag=13
net1: virtio=56:14:50:97:8D:E8,bridge=vmbr1,firewall=1,tag=12
net2: virtio=86:C4:B5:9E:62:5B,bridge=vmbr1,firewall=1,tag=57
numa: 1
ostype: l26
scsi0: rbd-nyc1:vm-1911-disk-0,cache=writeback,discard=on,iothread=1,size=16G,ssd=1
scsi1: rbd-nyc1:vm-1911-disk-1,cache=writeback,discard=on,iothread=1,size=1T,ssd=1
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=171b63f9-d6ef-4989-826e-4ca95a02471f
sockets: 2
vga: serial0
vmgenid: 9843d7ae-73e2-49da-941f-67dbac966b12

Chris said:
and storage config cat /etc/pve/storage.cfg.

Code:

dir: local
    path /var/lib/vz
    content iso,vztmpl,backup

zfspool: local-zfs
    pool rpool/data
    content rootdir,images
    sparse 1

rbd: rbd-nyc1
    content rootdir,images
    krbd 0
    pool rbd-nyc1

pbs: buppie
    datastore data
    server [redacted]
    content backup
    fingerprint [redacted]
    namespace [redacted]
    prune-backups keep-all=1
    username proxmox@pbs

Chris said:
Edit: Also check your Proxmox VE host performance with the proxmox-backup-client benchmark, this might give some further clues where the bottleneck is.

Code:

SHA256 speed: 1834.98 MB/s
Compression speed: 668.30 MB/s
Decompress speed: 912.65 MB/s
AES256/GCM speed: 2081.77 MB/s
Verify speed: 604.75 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ not tested         │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1834.98 MB/s (91%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 668.30 MB/s (89%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 912.65 MB/s (76%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 604.75 MB/s (80%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 2081.77 MB/s (57%) │
└───────────────────────────────────┴────────────────────┘

Chris · Aug 30, 2023

jdw said:
Now, four out of five of those backups were f'd and so not doing very much

Please post the config for the VM running without the speed issue as well as the backup job tasks for that VMs last backup, maybe this gives a clue.
Also, I assume you have already checked the journal for errors from around the time of the backups on both, Proxmox VE and Proxmox Backup Server side.

Do you have some bandwidth limits configured for some hosts on the Proxmox Backup Server side? Or some limits for the backup jobs in cat /etc/vzdump.conf?

The rest looks fine, apart from the speed and the 100% util in iostat of course. Please post also you Proxmox VE version pveversion -v and the Proxmox Backup Server version proxmox-backup-manager versions --verbose.

jdw · Aug 30, 2023

Chris said:
Please post the config for the VM running without the speed issue as well as the backup job tasks for that VMs last backup, maybe this gives a clue.

It doesn't. The fifth backup was running on a server that does not have any VMs with large disks. Those consistently work fine. The VM configs are basically the same except the virtual disks are 16-32GiB, tops instead of 1TB. The backup process bebops through about 15 of those in about the way I would expect it to work everywhere.

Chris said:
Also, I assume you have already checked the journal for errors from around the time of the backups on both, Proxmox VE and Proxmox Backup Server side.

I'm not sure what you mean by "journal" in this context. If you mean either the systemd journal or the journal of output from the backup process, no. No errors.

Although I do see this a lot:

INFO: scsi0: dirty-bitmap status: created new

And then it does a full backup, rather than an incremental one.

Chris said:
Do you have some bandwidth limits configured for some hosts on the Proxmox Backup Server side? Or some limits for the backup jobs in cat /etc/vzdump.conf?

No.

Chris said:
Please post also you Proxmox VE version pveversion -v

Code:

proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
proxmox-kernel-6.2: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: 0.8.41
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.6
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.4
libpve-storage-perl: 8.0.2
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.1-1
proxmox-backup-file-restore: 3.0.1-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.2
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

Chris said:
and the Proxmox Backup Server version proxmox-backup-manager versions --verbose.

Code:

proxmox-backup                3.0.0        running kernel: 6.2.16-3-pve
proxmox-backup-server         3.0.1-1      running version: 3.0.1      
pve-kernel-6.2                8.0.2                                    
pve-kernel-6.2.16-3-pve       6.2.16-3                                 
ifupdown2                     3.2.0-1+pmx3                             
libjs-extjs                   7.0.0-3                                  
proxmox-backup-docs           3.0.1-1                                  
proxmox-backup-client         3.0.1-1                                  
proxmox-mail-forward          0.2.0                                    
proxmox-mini-journalreader    1.4.0                                    
proxmox-offline-mirror-helper unknown                                  
proxmox-widget-toolkit        4.0.6                                    
pve-xtermjs                   4.16.0-3                                 
smartmontools                 7.3-pve1                                 
zfsutils-linux                2.1.12-pve1

I do think I have a lead.

Here is iperf between a PVE and the PBS:

Code:

[  5] local 192.168.52.24 port 48588 connected to 192.168.202.48 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  16.8 MBytes   141 Mbits/sec  304    225 KBytes       
[  5]   1.00-2.00   sec  15.9 MBytes   133 Mbits/sec   17    307 KBytes       
[  5]   2.00-3.00   sec  15.1 MBytes   127 Mbits/sec  151    232 KBytes       
[  5]   3.00-4.00   sec  15.8 MBytes   133 Mbits/sec   50    307 KBytes       
[  5]   4.00-5.00   sec  12.3 MBytes   103 Mbits/sec   26    215 KBytes       
[  5]   5.00-6.00   sec  15.8 MBytes   133 Mbits/sec   16    301 KBytes       
[  5]   6.00-7.00   sec  15.2 MBytes   127 Mbits/sec   25    236 KBytes       
[  5]   7.00-8.00   sec  15.8 MBytes   133 Mbits/sec   38    314 KBytes       
[  5]   8.00-9.00   sec  14.9 MBytes   125 Mbits/sec   49    242 KBytes       
[  5]   9.00-10.00  sec  15.8 MBytes   133 Mbits/sec   30    315 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   154 MBytes   129 Mbits/sec  706             sender
[  5]   0.00-10.03  sec   152 MBytes   127 Mbits/sec                  receiver

That is... terrible. It is routed and goes to a different building, but it is a 10Gbps fiber link, so it should not perform like that.

I will definitely look into that.

However, why would a slow connection to the backup server disrupt the guest? If anything, I would expect that the slower the backup goes, the less impact there could be on the guest.

While I look into that, is there any way to get a scheduled backup to go through Proxmox VE hosts in a cluster one at a time rather than firing all the jobs off simultaneously? I.e., like it does only VM per host at a time, do only one host in the cluster at a time.

Chris · Aug 30, 2023

jdw said:
INFO: scsi0: dirty-bitmap status: created new

And then it does a full backup, rather than an incremental one.

This is expected if the VM has been shutdown in between backup runs. The dirty bitmap can only be used if the guest remains running in between backups, otherwise the backup will be a full backup again.

jdw said:
I do think I have a lead.

Okay, yes sounds like a networking issue.

jdw said:
However, why would a slow connection to the backup server disrupt the guest? If anything, I would expect that the slower the backup goes, the less impact there could be on the guest.

This has to do with the snapshot type of backup. As the state of the disk written to the VM will be the one when running the fs-freeze/fs-thaw, each NEW guest write will have to write the old state of the block to the backup target first, in order to obtain a consistent state, a more detailed explanation you will find here https://bugzilla.proxmox.com/show_bug.cgi?id=3231#c4.

jdw said:
While I look into that, is there any way to get a scheduled backup to go through Proxmox VE hosts in a cluster one at a time rather than firing all the jobs off simultaneously? I.e., like it does only VM per host at a time, do only one host in the cluster at a time.

You could create multiple backup jobs, limiting each job to a different host.

jdw · Aug 30, 2023

Chris said:
This is expected if the VM has been shutdown in between backup runs. The dirty bitmap can only be used if the guest remains running in between backups, otherwise the backup will be a full backup again.

In these cases, the VMs have not been shut down.

Chris said:
This has to do with the snapshot type of backup. As the state of the disk written to the VM will be the one when running the fs-freeze/fs-thaw, each NEW guest write will have to write the old state of the block to the backup target first, in order to obtain a consistent state, a more detailed explanation you will find here https://bugzilla.proxmox.com/show_bug.cgi?id=3231#c4.

Oh. Yikes. So, if I understand it correctly, one should expect guest IO performance during a snapshot backup to be based entirely on the backup server's storage pool, and not its own. So, in our case, to run backups on every PVE host at once, the backup server would basically need to equal nearly the performance of the entire NVMe ceph cluster. Even without networking issues (I have it up to 3Gbps now), that's probably impossible.

If so, that's quite different than, e.g., a ZFS snapshot or a Ceph RBD snapshot, where I can make the snapshot, start sending it to you, and then keep making changes to the live fs with impunity.

Chris · Aug 30, 2023

jdw said:
In these cases, the VMs have not been shut down.

So probably there was no full backup for that VM yet?

jdw said:
Oh. Yikes. So, if I understand it correctly, one should expect guest IO performance during a snapshot backup to be based entirely on the backup server's storage pool, and not its own. So, in our case, to run backups on every PVE host at once, the backup server would basically need to equal nearly the performance of the entire NVMe ceph cluster. Even without networking issues (I have it up to 3Gbps now), that's probably impossible.

If so, that's quite different than, e.g., a ZFS snapshot or a Ceph RBD snapshot, where I can make the snapshot, start sending it to you, and then keep making changes to the live fs with impunity.

Well, you only have to directly write blocks which are getting written to in the VM and have not yet been send to the backup storage, and these writes are buffered, so not really a problem under normal circumstances. Do you have such high IO in that one VM? But yes, if the backup storage/network is slow and you run out of buffer, you will see IO preformance decreases in your guest during an ongoing backup.

jdw · Aug 30, 2023

Chris said:
So probably there was no full backup for that VM yet?

There are over a dozen.

Well, you only have to directly write blocks which are getting written to in the VM and have not yet been send to the backup storage

I think the point (for me) is that PBS introduces a new single point of failure that randomly bounces around between VMs. As described in the thread you linked, problems with the backup can lead to stalls, outright hang of the VM requiring a reboot, or write errors. Those are not desirable properties for a backup system.

Reading through that reminded me that we did in fact experience failed writes at one point. Fortunately, the guest VM filesystem was ZFS so the writes that didn't happen didn't corrupt the filesystem.

I think PBS is a great idea with enormous potential, and has a lot of neat features. I also think that the current drawbacks rule it out for us. I hope you all are able to take advantage of storage native snapshot capabilities (as described in the thread) to take it out of the running VM's critical path. If so, I would really like to reconsider it at that time.

Thanks!

_gabriel · Sep 4, 2023

jdw said:
There are over a dozen.

fast (=read only changed blocks in guest) incremental backup provided by dirty bitmap feature of qemu works only with 1 destination, if a vzdump is done between a PBS backup, dirty bitmap is lost. or if backup to 2 different PBS, dirty is lost.

jdw · Sep 4, 2023

_gabriel said:
fast (=read only changed blocks in guest) incremental backup provided by dirty bitmap feature of qemu works only with 1 destination, if a vzdump is done between a PBS backup, dirty bitmap is lost. or if backup to 2 different PBS, dirty is lost.

Neither of those circumstances were applicable.

jamarsa · Sep 5, 2023

Regarding the network speed, I had a similar issue in a cluster of servers with mixed 1Gbpe/10Gbpe cards, all connected to the same switch (but in different IP ranges associated to each speed). I discovered that depending on the sender/receiver, the OS selected sometimes to connect via interfaces that where *not* in the IP range I was using, and used a 1Gbpe slot instead of the 10Gbpe slot intended. The explanation is that the kernel *sees* the MAC address in all of these interfaces, and selects the slot disregarding the best (as in speed) card or the IP range assigned.

Apparently this behaviour is intended to serve as a fault-tolerant solution, but it was a nuisance in my case. This was prevented activating arp_ignore via sysctl, either in all the cards or only the necessary ones. For example:

net.ipv4.conf.bond13.arp_ignore=1

I don't know if your situation is the same as mine, and I see that your speed is way lower than expected in a basic 1Gbpe interface (which is the lower expectation nowadays). But I wanted to mention that just in case.

Have you tested also IP conflicts? Too much speed in this case (and no interruptions). I always check for that as a first rule in network issues.

jdw · Sep 5, 2023

In our case, the poor network speeds were caused by the PVE servers incorrectly routing to the PBS server through a software router that is only supposed to be used as an out-of-band backup. We were eventually able to get the bandwidth up to about 4Gbps, which is reasonable, given other traffic on the inter-building link.

Search

Search

PBS incredibly slow, guests hang

jdw

Renowned Member

scyto

Active Member

jdw

Renowned Member

scyto

Active Member

jdw

Renowned Member

scyto

Active Member

Chris

Proxmox Staff Member

jdw

Renowned Member

Chris

Proxmox Staff Member

jdw

Renowned Member

Chris

Proxmox Staff Member

jdw

Renowned Member

Chris

Proxmox Staff Member

jdw

Renowned Member

_gabriel

Renowned Member

jdw

Renowned Member

jamarsa

Member

jdw

Renowned Member