Hi, I've been trying really hard to overcome this myself, but here I am. Please halp.
I'm setting up a new install of Proxmox 8.1.3 on Dell R720. I'm expecting significantly higher speeds than what I'm getting during ZSTD backups. Currently there is only one container in it (at idle). Nothing else is going on.
34GiB container backs up LOCALLY at speed of 31MiB/s - that takes over 18 minutes
There is barely noticeable rise is CPU utilization (pic below is last minutes before backup finishes); minimal IO delay ( ~1.5% max)
Backup direction / ct image location does not matter: SATA -> NVME, NVME -> SATA, NVME->NVME
Hardware:
Below results are lower than I'm used to, but it is still not 30/MB/s
Some FIO benchmarks (that I don't fully understand)
FIO SATA SSD:
FIO NVME:
TBH I dont know what this one tells me (benchmarking compression/decompression on a sample dictionary file?), all I see it's NOT 30MB/s
I also have 2 small, local servers that run on Dell Optiplex 5050 (below) with 1x SATA SSD and 1x NVME SSD inside and they back up over 10 times faster, while the CPU itself has 1/4 of the performance on paper.
32.00 GiB CT in 96 seconds (341.3 MiB/s) (NVME -> SSD)
I'm setting up a new install of Proxmox 8.1.3 on Dell R720. I'm expecting significantly higher speeds than what I'm getting during ZSTD backups. Currently there is only one container in it (at idle). Nothing else is going on.
34GiB container backs up LOCALLY at speed of 31MiB/s - that takes over 18 minutes
There is barely noticeable rise is CPU utilization (pic below is last minutes before backup finishes); minimal IO delay ( ~1.5% max)
Backup direction / ct image location does not matter: SATA -> NVME, NVME -> SATA, NVME->NVME
Hardware:
- Dell R720
- CPU: 2 x Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz (2 Sockets = 20 cores, 40 threads)
- RAM: 128GB DDR3
- Storage:
- 2x Consumer grade SATA 120 GB SSD,
- Proxmox disk (but also tested storing containers here and backing from here)
- ZFS Raid1 (ashift = 12, UPDATE: i'm thinking if this could be causing problems)
- plugged in front H710 Mini D1 with LSI IT mode firmware
- 2x Consumer grade m.2 1TB NVME
- ZFS Raid1 (ashift = 12, UPDATE: i'm thinking if this could be causing problems)
- storage for ct/vm
- plugged inside the server in PCIe adapter
- 2x Consumer grade SATA 120 GB SSD,
Code:
# proxmox-backup-client benchmark
SHA256 speed: 251.26 MB/s
Compression speed: 350.34 MB/s
Decompress speed: 463.52 MB/s
AES256/GCM speed: 700.60 MB/s
Verify speed: 186.05 MB/s
Below results are lower than I'm used to, but it is still not 30/MB/s
Code:
# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 15672 MB in 1.99 seconds = 7856.58 MB/sec
Timing buffered disk reads: 602 MB in 3.00 seconds = 200.37 MB/sec
Code:
# hdparm -Tt /dev/nvme0n1
/dev/nvme0n1:
Timing cached reads: 14980 MB in 1.99 seconds = 7543.94 MB/sec
Timing buffered disk reads: 2164 MB in 3.00 seconds = 721.25 MB/sec
Some FIO benchmarks (that I don't fully understand)
FIO SATA SSD:
Code:
sync_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
WRITE: bw=1031KiB/s (1056kB/s), 1031KiB/s-1031KiB/s (1056kB/s-1056kB/s), io=1024MiB (1074MB), run=1016772-1016772msec
sync_randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
READ: bw=181MiB/s (190MB/s), 181MiB/s-181MiB/s (190MB/s-190MB/s), io=10.0GiB (10.7GB), run=56551-56551msec
sync_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
WRITE: bw=26.2MiB/s (27.5MB/s), 26.2MiB/s-26.2MiB/s (27.5MB/s-27.5MB/s), io=5120MiB (5369MB), run=195570-195570msec
sync_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
READ: bw=6020MiB/s (6312MB/s), 6020MiB/s-6020MiB/s (6312MB/s-6312MB/s), io=10.0GiB (10.7GB), run=1701-1701msec
async_uncached_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
WRITE: bw=111MiB/s (116MB/s), 111MiB/s-111MiB/s (116MB/s-116MB/s), io=4096MiB (4295MB), run=36882-36882msec
async_cached_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
WRITE: bw=9279KiB/s (9501kB/s), 9279KiB/s-9279KiB/s (9501kB/s-9501kB/s), io=4096MiB (4295MB), run=452035-452035msec
async_uncached_randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
READ: bw=1259MiB/s (1321MB/s), 1259MiB/s-1259MiB/s (1321MB/s-1321MB/s), io=40.0GiB (42.9GB), run=32525-32525msec
async_cached_randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
READ: bw=938MiB/s (984MB/s), 938MiB/s-938MiB/s (984MB/s-984MB/s), io=40.0GiB (42.9GB), run=43652-43652msec
async_uncached_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
WRITE: bw=277MiB/s (290MB/s), 277MiB/s-277MiB/s (290MB/s-290MB/s), io=8192MiB (8590MB), run=29627-29627msec
async_cached_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
WRITE: bw=83.3MiB/s (87.4MB/s), 83.3MiB/s-83.3MiB/s (87.4MB/s-87.4MB/s), io=8192MiB (8590MB), run=98336-98336msec
async_uncached_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
READ: bw=9731MiB/s (10.2GB/s), 9731MiB/s-9731MiB/s (10.2GB/s-10.2GB/s), io=200GiB (215GB), run=21046-21046msec
async_cached_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
READ: bw=9375MiB/s (9831MB/s), 9375MiB/s-9375MiB/s (9831MB/s-9831MB/s), io=200GiB (215GB), run=21845-21845msec
FIO NVME:
Code:
sync_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
WRITE: bw=708KiB/s (725kB/s), 708KiB/s-708KiB/s (725kB/s-725kB/s), io=1024MiB (1074MB), run=1480555-1480555msec
sync_randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
READ: bw=549MiB/s (575MB/s), 549MiB/s-549MiB/s (575MB/s-575MB/s), io=10.0GiB (10.7GB), run=18658-18658msec
sync_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
WRITE: bw=16.6MiB/s (17.4MB/s), 16.6MiB/s-16.6MiB/s (17.4MB/s-17.4MB/s), io=5120MiB (5369MB), run=308033-308033msec
sync_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
READ: bw=5133MiB/s (5382MB/s), 5133MiB/s-5133MiB/s (5382MB/s-5382MB/s), io=10.0GiB (10.7GB), run=1995-1995msec
async_uncached_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
WRITE: bw=54.2MiB/s (56.9MB/s), 54.2MiB/s-54.2MiB/s (56.9MB/s-56.9MB/s), io=4096MiB (4295MB), run=75534-75534msec
async_cached_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
WRITE: bw=8156KiB/s (8351kB/s), 8156KiB/s-8156KiB/s (8351kB/s-8351kB/s), io=4096MiB (4295MB), run=514290-514290msec
async_uncached_randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
READ: bw=1280MiB/s (1342MB/s), 1280MiB/s-1280MiB/s (1342MB/s-1342MB/s), io=40.0GiB (42.9GB), run=32010-32010msec
async_cached_randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
READ: bw=1985MiB/s (2081MB/s), 1985MiB/s-1985MiB/s (2081MB/s-2081MB/s), io=40.0GiB (42.9GB), run=20637-20637msec
async_uncached_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
WRITE: bw=101MiB/s (106MB/s), 101MiB/s-101MiB/s (106MB/s-106MB/s), io=8192MiB (8590MB), run=81137-81137msec
async_cached_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
WRITE: bw=88.9MiB/s (93.2MB/s), 88.9MiB/s-88.9MiB/s (93.2MB/s-93.2MB/s), io=8192MiB (8590MB), run=92120-92120msec
async_uncached_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
READ: bw=10.8GiB/s (11.6GB/s), 10.8GiB/s-10.8GiB/s (11.6GB/s-11.6GB/s), io=200GiB (215GB), run=18550-18550msec
async_cached_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
READ: bw=10.1GiB/s (10.9GB/s), 10.1GiB/s-10.1GiB/s (10.9GB/s-10.9GB/s), io=200GiB (215GB), run=19768-19768msec
TBH I dont know what this one tells me (benchmarking compression/decompression on a sample dictionary file?), all I see it's NOT 30MB/s
Code:
root@doe:~# zstd -b0 -e5 -T0 webster
0#webster : 41458703 -> 12133011 (x3.417), 394.9 MB/s 292.0 MB/s
1#webster : 41458703 -> 13669544 (x3.033), 1357.0 MB/s, 611.7 MB/s
2#webster : 41458703 -> 12824426 (x3.233), 817.2 MB/s 475.9 MB/s
3#webster : 41458703 -> 12133011 (x3.417), 307.2 MB/s, 254.3 MB/s
4#webster : 41458703 -> 11950087 (x3.469), 275.4 MB/s, 311.1 MB/s
5#webster : 41458703 -> 11221373 (x3.695), 123.3 MB/s, 451.3 MB/s
I also have 2 small, local servers that run on Dell Optiplex 5050 (below) with 1x SATA SSD and 1x NVME SSD inside and they back up over 10 times faster, while the CPU itself has 1/4 of the performance on paper.
32.00 GiB CT in 96 seconds (341.3 MiB/s) (NVME -> SSD)
Attachments
Last edited: