We had some issues with our existing PBS setup that was using all spinning drives and wasn't setup for optimal performance.
We borrowed some lab equipment to get us on some more performant hardware until we can properly right size our backup environment. We are overkill on CPUs right now, so we likely will be changing out some of that.
Here is our overall production setup:
PVE Cluster - 4 nodes - Roughly 50-60 VMs (and growing) all Linux based
Proxmox 7.4-16/Ceph 17.2.6
Supermicro SSG-110P-NTR10
10G Intel X710's (these will be getting swapped out for 100G Mellanox in the next few weeks)
Intel Gold 6312U - Single socket
512GB of RAM
6x3.84TB Kixoia NVMe drives
PBS Target - New
PBS 3.0-2
Supermicro SYS-120U-TNR
Intel Gold 6348 x2 (we will be right sizing this, just didnt wanna fuss with the CPU and left these in)
100G Mellanox Connect CX-5 configured for LACP
128GB of RAM
Supermicro AOC card with mirrored 512G drives for OS boot
6x7.68TB Samsung PM9A3 NVMe drives
The PBS NVMe drives are in a ZFS pool using striped mirrors for the best performance.
I'm not going to talk about the "old" setup other than it used 3x10TB Seagate spinners in a RAID-5. Not optimal but it's what I inherited. Rarely saw above 50MBps on an actual backup
I figured with this setup I would shift the bottleneck to the hypervisors until I get the card swapped out and that's exactly what happened. Here was a benchmark I took last week from one of the HV nodes to the old repo and the NEW repo.
Old
New shiny setup:
Pretty significant difference and is bumping up to the 10G maximum.
This morning I kicked off our very first backup - Pretty pleased with overall results. Just a single sample from a random VM.
The PBS host at the beginning of the backup was ingesting 2.75GiB/s per our monitoring.
Pretty impressed. Just goes to show if you provide PBS the right equipment it can be pretty performant out of the box.
Is there any additional performance tuning to be had? I left the ZFS ashift at a default of 12. Trying to decide what other tweaks I can make to see if I can squeeze more out of it. I'm not worried about this as much yet until I get the HV's migrated over to a 100G.
Thanks!
We borrowed some lab equipment to get us on some more performant hardware until we can properly right size our backup environment. We are overkill on CPUs right now, so we likely will be changing out some of that.
Here is our overall production setup:
PVE Cluster - 4 nodes - Roughly 50-60 VMs (and growing) all Linux based
Proxmox 7.4-16/Ceph 17.2.6
Supermicro SSG-110P-NTR10
10G Intel X710's (these will be getting swapped out for 100G Mellanox in the next few weeks)
Intel Gold 6312U - Single socket
512GB of RAM
6x3.84TB Kixoia NVMe drives
PBS Target - New
PBS 3.0-2
Supermicro SYS-120U-TNR
Intel Gold 6348 x2 (we will be right sizing this, just didnt wanna fuss with the CPU and left these in)
100G Mellanox Connect CX-5 configured for LACP
128GB of RAM
Supermicro AOC card with mirrored 512G drives for OS boot
6x7.68TB Samsung PM9A3 NVMe drives
The PBS NVMe drives are in a ZFS pool using striped mirrors for the best performance.
I'm not going to talk about the "old" setup other than it used 3x10TB Seagate spinners in a RAID-5. Not optimal but it's what I inherited. Rarely saw above 50MBps on an actual backup
I figured with this setup I would shift the bottleneck to the hypervisors until I get the card swapped out and that's exactly what happened. Here was a benchmark I took last week from one of the HV nodes to the old repo and the NEW repo.
Old
Code:
root@hypervisor04:~# proxmox-backup-client benchmark --repository root@pam@192.168.0.11:localbackups
Password for "root@pam": ****************
Uploaded 64 chunks in 5 seconds.
Time per request: 86543 microseconds.
TLS speed: 48.46 MB/s
SHA256 speed: 1246.77 MB/s
Compression speed: 516.65 MB/s
Decompress speed: 802.80 MB/s
AES256/GCM speed: 1975.93 MB/s
Verify speed: 479.39 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name │ Value │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 48.46 MB/s (4%) │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1246.77 MB/s (62%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed │ 516.65 MB/s (69%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed │ 802.80 MB/s (67%) │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed │ 479.39 MB/s (63%) │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed │ 1975.93 MB/s (54%) │
└───────────────────────────────────┴────────────────────┘
New shiny setup:
Code:
root@hypervisor04:~# proxmox-backup-client benchmark --repository root@pam@192.168.0.32:pool1
Password for "root@pam": *********
Uploaded 1351 chunks in 5 seconds.
Time per request: 3706 microseconds.
TLS speed: 1131.65 MB/s
SHA256 speed: 1227.23 MB/s
Compression speed: 518.51 MB/s
Decompress speed: 795.62 MB/s
AES256/GCM speed: 1947.87 MB/s
Verify speed: 481.29 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name │ Value │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 1131.65 MB/s (92%) │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1227.23 MB/s (61%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed │ 518.51 MB/s (69%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed │ 795.62 MB/s (66%) │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed │ 481.29 MB/s (63%) │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed │ 1947.87 MB/s (53%) │
└───────────────────────────────────┴────────────────────┘
Pretty significant difference and is bumping up to the 10G maximum.
This morning I kicked off our very first backup - Pretty pleased with overall results. Just a single sample from a random VM.
Code:
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2023-08-15 12:46:22
INFO: status = running
INFO: VM Name: myvm01
INFO: include disk 'scsi0' 'ceph-vm:vm-103-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/103/2023-08-15T12:46:22Z'
INFO: started backup task '3470134b-da97-46e8-8907-015b73271f38'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 3% (996.0 MiB of 32.0 GiB) in 3s, read: 332.0 MiB/s, write: 321.3 MiB/s
INFO: 6% (2.0 GiB of 32.0 GiB) in 6s, read: 350.7 MiB/s, write: 338.7 MiB/s
INFO: 10% (3.2 GiB of 32.0 GiB) in 9s, read: 416.0 MiB/s, write: 382.7 MiB/s
INFO: 24% (7.9 GiB of 32.0 GiB) in 12s, read: 1.6 GiB/s, write: 289.3 MiB/s
INFO: 27% (8.9 GiB of 32.0 GiB) in 15s, read: 350.7 MiB/s, write: 313.3 MiB/s
INFO: 30% (9.9 GiB of 32.0 GiB) in 18s, read: 329.3 MiB/s, write: 280.0 MiB/s
INFO: 34% (10.9 GiB of 32.0 GiB) in 21s, read: 354.7 MiB/s, write: 234.7 MiB/s
INFO: 37% (11.9 GiB of 32.0 GiB) in 24s, read: 318.7 MiB/s, write: 257.3 MiB/s
INFO: 39% (12.7 GiB of 32.0 GiB) in 27s, read: 297.3 MiB/s, write: 246.7 MiB/s
INFO: 43% (13.9 GiB of 32.0 GiB) in 30s, read: 392.0 MiB/s, write: 340.0 MiB/s
INFO: 46% (14.9 GiB of 32.0 GiB) in 33s, read: 338.7 MiB/s, write: 329.3 MiB/s
INFO: 50% (16.0 GiB of 32.0 GiB) in 36s, read: 385.3 MiB/s, write: 377.3 MiB/s
INFO: 52% (16.9 GiB of 32.0 GiB) in 39s, read: 320.0 MiB/s, write: 320.0 MiB/s
INFO: 56% (17.9 GiB of 32.0 GiB) in 42s, read: 341.3 MiB/s, write: 317.3 MiB/s
INFO: 58% (18.8 GiB of 32.0 GiB) in 45s, read: 288.0 MiB/s, write: 288.0 MiB/s
INFO: 61% (19.7 GiB of 32.0 GiB) in 48s, read: 312.0 MiB/s, write: 297.3 MiB/s
INFO: 64% (20.6 GiB of 32.0 GiB) in 51s, read: 313.3 MiB/s, write: 313.3 MiB/s
INFO: 67% (21.7 GiB of 32.0 GiB) in 54s, read: 368.0 MiB/s, write: 341.3 MiB/s
INFO: 70% (22.7 GiB of 32.0 GiB) in 57s, read: 346.7 MiB/s, write: 346.7 MiB/s
INFO: 73% (23.7 GiB of 32.0 GiB) in 1m, read: 321.3 MiB/s, write: 302.7 MiB/s
INFO: 76% (24.6 GiB of 32.0 GiB) in 1m 3s, read: 326.7 MiB/s, write: 317.3 MiB/s
INFO: 79% (25.5 GiB of 32.0 GiB) in 1m 6s, read: 308.0 MiB/s, write: 297.3 MiB/s
INFO: 83% (26.6 GiB of 32.0 GiB) in 1m 9s, read: 384.0 MiB/s, write: 381.3 MiB/s
INFO: 86% (27.6 GiB of 32.0 GiB) in 1m 12s, read: 330.7 MiB/s, write: 321.3 MiB/s
INFO: 89% (28.5 GiB of 32.0 GiB) in 1m 15s, read: 314.7 MiB/s, write: 310.7 MiB/s
INFO: 91% (29.4 GiB of 32.0 GiB) in 1m 18s, read: 305.3 MiB/s, write: 288.0 MiB/s
INFO: 94% (30.3 GiB of 32.0 GiB) in 1m 21s, read: 282.7 MiB/s, write: 280.0 MiB/s
INFO: 98% (31.4 GiB of 32.0 GiB) in 1m 24s, read: 384.0 MiB/s, write: 358.7 MiB/s
INFO: 100% (32.0 GiB of 32.0 GiB) in 1m 27s, read: 212.0 MiB/s, write: 204.0 MiB/s
INFO: backup is sparse: 4.66 GiB (14%) total zero data
INFO: backup was done incrementally, reused 5.64 GiB (17%)
INFO: transferred 32.00 GiB in 87 seconds (376.6 MiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 103 (00:01:28)
The PBS host at the beginning of the backup was ingesting 2.75GiB/s per our monitoring.
Pretty impressed. Just goes to show if you provide PBS the right equipment it can be pretty performant out of the box.
Is there any additional performance tuning to be had? I left the ZFS ashift at a default of 12. Trying to decide what other tweaks I can make to see if I can squeeze more out of it. I'm not worried about this as much yet until I get the HV's migrated over to a 100G.
Thanks!