Benchmark ideas for a ghetto build

Ok, I will try ;)

Why does a external 2,5" HDD perform as well as an NVME drive for reusing data?

Exceptions:
Poor performance, because it has to read many small files/metadata/hashes to compare that to the backup task. I thought that is why Proxmox officially recommends special vdev, because metadata performance is so important.

Reality:
Turns out even a very bad HDD performs as well as a NVME for that task.
 
Ohh, and for what hardware and runs I am comparing, I am comparing:

From PVE to PBS ext4 external USB 3.0 2,5“ HDD
vs.
From PVE to PBS ext4 INTEL SSDPEKKW128G7
 
Nobody able to answer me this?

Why does a external 2,5" HDD perform as well as an NVME drive for reusing data?


Otherwise I can't really plan for what would be my bottleneck for a real setup.
 
I have two PBS systems.
One uses a 2,5" HDD.
The other a half decent NVME SSD.
When I run the backup tasks twice (aka no new data, aka 100% data reused), why is there no performance difference?
 
Compare a Verify Job then you'll see the diff.
PBS can't write at max sequential disk speed as it doesn't write in sequential.
The point of PBS is the forever incremental and dedup backups, which allow skip full backups each time, the must for daily backups.
 
Compare a Verify Job then you'll see the diff.
That is not that important to me. As long as the verify job can finish within 6 days ;)
No but seriously, I mostly do my backups irregularly. My VMs have their none OS and important data mostly on a NFS shares.
So I don't regularly backup them, only during maintenance. Changes are small, I only care about backup performance because I don't want to wait multiple minutes.

if no new data, backups disks aren't used.
Interesting. I thought that PVE somehow needs to know what has changed and what it has to resend and thous read some information from PBS. Apparently this is not very read intensive on PBS but more on PVE.
 
Interesting. I thought that PVE somehow needs to know what has changed and what it has to resend and thous read some information from PBS. Apparently this is not very read intensive on PBS but more on PVE.
PVE sends everything to PBS. So Full Read on PVE. PBS than deduplicates on the disk what has already been stored.
 
PVE sends everything to PBS. So Full Read on PVE. PBS than deduplicates on the disk what has already been stored.
Ahh that makes sense. So basically both disk were not the bottleneck and able to receive roughly 100mb/s, but the bottleneck was the PVE TLS speed of 117.54 MB/s?
 
That is very interesting.
I somehow, without any knowledge to back it up, don't trust live backups.
Yeah I know that QEMU should handle that, but to me it feels like added complexity to something where I wanna reduce the chances for errors as much as possible.

Snapshot is probably also not applicable to me, since I use ZFS and you are not talking about ZFS snapshots, but qcow2 snapshots.

An official ZFS snapshot sending mechanism from Proxmox would be pretty cool :)

But until then, I will use PBS, shutdown backups and try to figure out the bottleneck of my workload.
Currently, for my use case, I think it is the PVE TLS speed. Would anyone agree with that?
 
Last edited:
of course "Snapshot" mode works with ZFS and Lvmthin.
Sorry I wasn't precise in my words.

Sure I can create a backup in snapshot mode, but at least for a NFS share as backup destination, it will create another full backup and not only send the changes. While in the ZFS world (zfs send), for a snapshot you only send a delta data.
 
Last edited:
PVE sends everything to PBS. So Full Read on PVE. PBS than deduplicates on the disk what has already been stored.
Slightly different: PVE reads everything (piece for piece) and it calculates the checksum of each chunk. This checksum is sent to PBS. The actual data is only transmitted if that checksum is not known already.

(Bitmaps are already mentioned in #31)
 
  • Like
Reactions: IsThisThingOn
Slightly different: PVE reads everything (piece for piece) and it calculates the checksum of each chunk. This checksum is sent to PBS. The actual data is only transmitted if that checksum is not known already.

(Bitmaps are already mentioned in #31)
Ahh, that makes sense. That probably also explains why the random metadata read speed on PBS is unimportant in my scenario and both disks offer similar results.

I know this could come off as tedious, but I am interested in how the backup process works, otherwise, I can't really plan on what hardware I should use. The documentation in the steps of how it is done is a little bit scarce and the hardware requirements are pretty vague.

So for my workload, would you agree that these are the steps involved?

1. Reading data on PVE. Probably sequential? Not in the benchmark.
2. Calculating the checksum for a chunk. Probably the SHA256 value from the benchmark.
3. Since I backup VMs, PVS and PBS use fixed sized chunks. By default 4MiB. Since I do shutdown VMs for backup, dirty bitmaps are not an option. So to backup the example 32GB VM, there will be roughly 8200 chunks compared to PBS.
4. Comparing these values is a sequential (?) but rather easy small file read operation on PBS. Is this also depending on network latency? Probably not, since my guess is that the bitmap is also stored on the PBS, so there is no need to compare 8200 SHA256 values over the internet. Not in the benchmark.
5. If the chunk is new, compress it on PVS. This is the compression speed from the benchmark.
6. Encrypt it to send it over TLS to PBS. This is the TLS speed of the benchmark.
 
Last edited:
So for my workload, would you agree that these are the steps involved?
Looks good. Note that I am not a developer, just a user:
1. Reading data on PVE. Probably sequential?
Sequential in regard of the "top view", be it a .qcow-file or a LVM/ZFS block device. Note that the actual data may (will!) be scattered on the physical device. That's why IOPS are often more relevant than bandwidth.
4. Comparing these values is a sequential (?) but rather easy small file read read operation on PBS. Is this also depending on network latency? Not in the benchmark.
Sure! As soon as you need to go over a wire for each and every chunk (possibly multiple times!) you introduce a relevant latency. Local storage (NVMe) may give you a hundred thousands of IOPS, while TCP/IP over 1/10/100 GBit/s will always result in only a fraction of it. That's one reason why often caches are involved - introducing their own problems...

If this is really dramatic/important or if it can just be ignored depends on your actual use case and requirements. I can not judge it.
 
  • Like
Reactions: IsThisThingOn

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!