Proxmox backup - from 2% to 500% CPU during backup tasks

Jul 6, 2024
22
2
3
Hello,
Now first I want to clarify that this is not unique to one VM or one server - it happens for all of them. Some more, some less. Sometimes CPU only goes to 70% but most of the time it reaches 200-500%.

The issue I have is that when I take a backup on a VM (snapshot option), the VM cpu usage goes very high. In this case it went from 2% to 559%
It's only the VM that has CPU issues, the PBS or the proxmox hypervisor does not.

cpu usage proxmox.png


I've tried a lot of things to fix this.

I've tried Fleecing storage (RBD), and backing up to fast SSD datastore on the PBS.
The backup server is on the same network as the rest, and all are connected via ConnectX-5/6 25 GbE. The latency between the PBS and proxmox is 0.032ms on a ping -f
The backup isn't slow, it happens at 300 MiB/s write:
Code:
INFO:   3% (1.8 GiB of 50.0 GiB) in 3s, read: 614.7 MiB/s, write: 321.3 MiB/s
INFO:   7% (3.8 GiB of 50.0 GiB) in 6s, read: 684.0 MiB/s, write: 344.0 MiB/s
INFO:   9% (4.8 GiB of 50.0 GiB) in 9s, read: 330.7 MiB/s, write: 329.3 MiB/s
INFO:  12% (6.0 GiB of 50.0 GiB) in 12s, read: 434.7 MiB/s, write: 421.3 MiB/s
INFO:  15% (7.6 GiB of 50.0 GiB) in 15s, read: 516.0 MiB/s, write: 498.7 MiB/s
INFO:  18% (9.2 GiB of 50.0 GiB) in 18s, read: 577.3 MiB/s, write: 557.3 MiB/s
INFO:  21% (11.0 GiB of 50.0 GiB) in 21s, read: 586.7 MiB/s, write: 586.7 MiB/s
INFO:  25% (12.8 GiB of 50.0 GiB) in 24s, read: 613.3 MiB/s, write: 593.3 MiB/s
INFO:  29% (14.6 GiB of 50.0 GiB) in 27s, read: 610.7 MiB/s, write: 590.7 MiB/s
INFO:  32% (16.4 GiB of 50.0 GiB) in 30s, read: 617.3 MiB/s, write: 597.3 MiB/s
INFO:  35% (17.6 GiB of 50.0 GiB) in 33s, read: 409.3 MiB/s, write: 389.3 MiB/s
INFO:  38% (19.3 GiB of 50.0 GiB) in 36s, read: 609.3 MiB/s, write: 589.3 MiB/s
INFO:  42% (21.2 GiB of 50.0 GiB) in 39s, read: 617.3 MiB/s, write: 597.3 MiB/s
INFO:  45% (22.9 GiB of 50.0 GiB) in 42s, read: 612.0 MiB/s, write: 612.0 MiB/s
INFO:  49% (24.8 GiB of 50.0 GiB) in 45s, read: 617.3 MiB/s, write: 597.3 MiB/s
INFO:  53% (26.6 GiB of 50.0 GiB) in 48s, read: 617.3 MiB/s, write: 597.3 MiB/s
INFO:  56% (28.4 GiB of 50.0 GiB) in 51s, read: 620.0 MiB/s, write: 600.0 MiB/s
INFO:  60% (30.2 GiB of 50.0 GiB) in 54s, read: 621.3 MiB/s, write: 601.3 MiB/s
INFO:  64% (32.0 GiB of 50.0 GiB) in 57s, read: 622.7 MiB/s, write: 602.7 MiB/s

I'm out of ideas on this one!

Specs:
My CPU is quite weak on the PBS, it's an Xeon Silver 4110 CPU.
My proxmox server has an AMD EPYC 9654 CPU

Oh and 100% replicatable this one, no matter the VM or the application running inside them.

Code:
 proxmox-backup-client benchmark --repository backup
Uploaded 507 chunks in 5 seconds.
Time per request: 9918 microseconds.
TLS speed: 422.87 MB/s
SHA256 speed: 375.29 MB/s
Compression speed: 330.82 MB/s
Decompress speed: 448.62 MB/s
AES256/GCM speed: 2515.94 MB/s
Verify speed: 206.35 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 422.87 MB/s (34%)  │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 375.29 MB/s (19%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 330.82 MB/s (44%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 448.62 MB/s (37%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 206.35 MB/s (27%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 2515.94 MB/s (69%) │
└───────────────────────────────────┴────────────────────┘
 
Last edited:
Please read:

https://pbs.proxmox.com/docs/installation.html#recommended-server-system-requirements

add more cpu cores and if you use ZFS you must have AES on board.

I have a simple and dedicated Proxmox BS with a Intel® Core™ i3-10105 Prozessor, 4C and 8T, 32 GB DDR Ram, with 2 NIC: 1 GBit/s and 2.5 GBit/s. All run on ZFS with 4x SSD, as 2x VDEV ZFS 2x Mirror.
I can buy something like Intel Xeon Gold 6254 SRF92 3.1GHz 18C / 36T, yes but I thought current 8C 16T CPU is enough for PBS.

I tried to use both ZFS and a single SSD but both give the same result
 
The simple Proxmox BS, Intel® Core™ i3-10105 Prozessor, 4 Core and 4 GB Ram, run the benchmark with:
Code:
proxmox-backup-client benchmark
SHA256 speed: 567.86 MB/s  
Compression speed: 612.49 MB/s  
Decompress speed: 1001.96 MB/s  
AES256/GCM speed: 4351.57 MB/s  
Verify speed: 359.04 MB/s  
┌───────────────────────────────────┬─────────────────────┐
│ Name                              │ Value               │
╞═══════════════════════════════════╪═════════════════════╡
│ TLS (maximal backup upload speed) │ not tested          │
├───────────────────────────────────┼─────────────────────┤
│ SHA256 checksum computation speed │ 567.86 MB/s (28%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 compression speed    │ 612.49 MB/s (81%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 decompression speed  │ 1001.96 MB/s (84%)  │
├───────────────────────────────────┼─────────────────────┤
│ Chunk verification speed          │ 359.04 MB/s (47%)   │
├───────────────────────────────────┼─────────────────────┤
│ AES256 GCM encryption speed       │ 4351.57 MB/s (119%) │
└───────────────────────────────────┴─────────────────────┘
 
Last edited:
could you post the VM config?
 
could you post the VM config?
Yes of course. Here is the VM config:
I redacted some sensitive things

Code:
/etc/pve/qemu-server# cat 1003.conf
#IP, IPv6
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0
cipassword: REDACTED
ciuser: root
cores: 2
cpu: host
cpuunits: 2000
ide0: ceph-storage-vm:vm-1003-cloudinit,media=cdrom
ide2: none,media=cdrom
ipconfig0: ip=10.0.0.4/24,gw=10.0.0.1
memory: 16384
meta: creation-qemu=9.0.2,ctime=1724565322
name: Redacted
nameserver: 1.0.0.1
net0: virtio=98:dc:3c:42:da:87,bridge=vmbr0,firewall=1,rate=10000
numa: 0
onboot: 1
ostype: l26
scsi0: ceph-storage-vm:vm-1003-disk-0,aio=threads,cache=writeback,discard=on,iothread=1,mbps_rd=500,mbps_wr=500,size=100G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=8bf39e4c-4a06-43a1-8e10-29231992a31e
sockets: 1
vmgenid: 594289f0-0571-4075-8266-345ad1066983



run the benchmark with:
If I run the benchmark on the proxmox host itself (not on the backup server) here is the result
Code:
# proxmox-backup-client benchmark
SHA256 speed: 1517.24 MB/s 
Compression speed: 375.41 MB/s 
Decompress speed: 508.24 MB/s 
AES256/GCM speed: 3295.55 MB/s 
Verify speed: 379.58 MB/s 
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ not tested         │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1517.24 MB/s (75%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 375.41 MB/s (50%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 508.24 MB/s (42%)  │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 379.58 MB/s (50%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 3295.55 MB/s (90%) │
└───────────────────────────────────┴────────────────────┘

I've ordered the Intel Xeon Gold 6254 3.1GHz 4.0 GHz boost 18C / 36T CPU to replace the Silver CPU in my PBS just to be sure. Based on benchmarks it performs similar to EPYC Rome 7402P.

Since last time we spoke I also demo:d ZFS + special vdev for 256K blocks and below

3k iops 4k random and 1.8G/s throughput on 1M blocksize. In theory this should also be fast enough because the VM only <1k IOPS but the result is the same as with the standard SSD backup making me believe this is not the issue. Besides fleecing should have shown improvement if it was iops related right?

Code:
fio --name=smallfile --rw=randwrite --bs=4k --iodepth=32     --direct=1 --size=4G --numjobs=4 --group_reporting     --filename=/backup/smallfile.dat
smallfile: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=32

bs: 4 (f=4): [w(4)][0.6%][w=12.8MiB/s][w=3274 IOPS][eta 23m:00s]

Code:
fio --name=sequential-write-mq-test --ioengine=libaio --rw=write --bs=1M --direct=1 --size=5G --numjobs=1 --group_reporting --filename=/backup/fio-sequential-mq.da
Jobs: 1 (f=1): [W(1)][-.-%][w=1851MiB/s][w=1850 IOPS][eta 00m:00s]
 
Last edited:
You ordered a gold xeon to replace the silver xeon on pbs server while you have a cpu usage problem inside most all of your vm's which running on amd epyc when doing a backup (which runs fine between 300-600MB/s) ... ! So where is the sense for as now pbs could do even faster but even your vm's not at all ... or what do you think about ... ?
 
Last edited:
You ordered a gold xeon to replace the silver xeon on pbs server while you have a cpu usage problem inside most all of your vm's which running on amd epyc when doing a backup (which runs fine between 300-600MB/s) ... ! So where is the sense for as now pbs could do even faster but even your vm's not at all ... or what do you think about ... ?
Well the CPU is quite weak on the PBS. While I think this is enough, the transfer speed bottleneck is likely in TLS, and getting faster CPU (better single thread) like the gold one will make backups slightly faster.
https://www.cpubenchmark.net/compare/3106vs3482/Intel-Xeon-Silver-4110-vs-Intel-Xeon-Gold-6254

Hence we may reduce the time vps has 500% cpu usage by 1/3 or something like that.
At the same time, there's a less chance of a "don't bother me" answer like saying the CPU on PBS is too weak (E.g first reply xd)
 
Last edited:
Sorry, don't misunderstand me, I'm just trying to understand your vm problem as never see those while I must say never did backup by pve or pbs (we did by vm/lxc freeze and reflink, later cp that to other host, compress and dedup, so absolutely no vm load for backup).
Otherwise a xeon silver isn't bad as it's doing so much i/o and a fast cpu would wait mostly if not have the i/o power to feed it.