[TUTORIAL] Datastore Performance Tester for PBS

Der Harry

Active Member
Sep 9, 2023
242
51
28
This is a performance tester for datastores of your PBS.

-> Intended before you setup a production PBS. <-

Bash:
apt-get update
apt-get install git
git clone https://github.com/egandro/pbs-storage-perf-test.git
cd pbs-storage-perf-test
# replace datastore-dir with your own directory
./create_random_chunks.py /datastore-dir
rm -rf /datastore-dir/dummy-chunks

- Read how/what/and why we test in this way: https://github.com/egandro/pbs-storage-perf-test/blob/main/README.md
- Our results: https://github.com/egandro/pbs-storage-perf-test/blob/main/results.md
- Our conclusion: https://github.com/egandro/pbs-storage-perf-test/blob/main/conclusion.md
 
I like to hear this. I've got one of these.
"it is ok to have your PBS installed as VM and put the virtual datastore disk (the .qcow2 file in Proxmox) on nfs"

And this was obvious, but now its been proven.
"avoid nfs and samba like the plague"

And yes, I built one of these out of a pile of junk. Much more challenging than it sounds! Good to hear that i could have done worse somehow.
"usb is not super bad (in contrast to nfs, smb)"


I don't really python. I'm gonna dig through the code. I want to repurpose it as a before and after test for disk tuning.
 
Concerning "give some reasons why zfs should be preferred over ext4, our numbers don't show any benefit in zfs so far":
ZFS is about data integrity and not performance. Ext4 is way simpler and therefore usually performs better. Same for mdadm as software raid.
ZFS for PBS is useful to:
- integrity checks of non-chunk files like of index/catalog files (as the built-in verify jobs will only verify the chunks)
-the ability to fix corrupted chunks (PBS verify tasks can only detect but not correct corrupted chunks...PBS will try to upload that chunk again once it is marked as corrupted...but you are screwed if that data doesn't exist anymore on the PVE or another PBS...especially as a single corrupted chunk could make ALL of the backup snapshots of a VM non-restorable...fro mthe backup done an hour ago down to the first backup snapshot you did 4 years ago...or even the backup snapshots of multiple VMs. Thats the problem with deduplication. If nothing is ever stored twice, you are screwed if that single copy is lost or gets damaged. Another reason why a single PBS isn't a good idea and why it's a good thing to have multiple synced PBSs at different locations)
- snapshots (search the forums how often people ask if an accidentally deleted backup snapshot could be restored)
- way faster GC tasks in case you use some SSDs as special devices or L2ARC to store all the metadata on fast SSDs and only the data on HDDs (so many people don't want to spend for SSD-only)
- software raid
- some people also like to abuse the PBSs ZFS pool as a target for replication for backups reasons outside of PVE ("zfs send | zfs recv")

So performance-wise ZFS only helps to boost HDD performance using SSDs or by combining lots of smaller disks into a raid array for better bandwidth/IOPS instead of few bigger disks.
 
Last edited:
Concerning "give some reasons why zfs should be preferred over ext4, our numbers don't show any benefit in zfs so far":
ZFS is about data integrity and not performance. Ext4 is way simpler and therefore usually performs better. Same for mdadm as software raid.
ZFS for PBS is useful to:
We run a speed test. Do you have numbers that it performs better?

I can't - probably on raid - I covered that but I didn't test it.

Anything else about ZFS - yes - true - but this is a filesystem speed test not a feature test.
 
Anything else about ZFS - yes - true - but this is a filesystem speed test not a feature test.
I would count that "HDD for data + SSD for metadata" vs "HDD for data as well as metadata" as a important part of a filesystem speedtest, not just for a feature test. Will result in magnitudes of faster GC tasks and a cheap option for people who aren't willing to pay for SSD-only, as the SSDs need only to be 1-2% of the capacity of the HDDs.
We run a speed test. Do you have numbers that it performs better?
There are multiple threads in this forum where people benchmarked ZFS raid vs mdadm raid. From what I remember mdadm was always better performing. Same for ext4 on LVM vs ZFS dataset.
 
Last edited:
  • Like
Reactions: Johannes S
I would count that "HDD for data + SSD for metadata" vs "HDD for data as well as metadata" as a important part of a filesystem speedtest, not just for a feature test.

Feel free to send a PR request with the data you collect. I am happy to put that in the result set. I consider that > not < important. If ext4 vs. zfs for a large e.g. n=100.000 only has 1-5 sec of a difference it's not relevant. There is no significant gain in using that.

If you have the time and the resources to proove that lvm (what was never tested!) is 3 seconds faster then zfs raid - feel free to send a PR.

For my tests - that acutally you can read > I < could not find a benefit in zfs.

Probably in the features there is...

There are multiple threads in this forum where people benchmarked ZFS raid vs mdadm raid. From what I remember mdadm was always better performing. Same for ext4 on LVM vs ZFS dataset.

I didn't run multiple hardware / ...

My point I wanted to prove - avoid nfs / smb ... if you are forced to use a remote fs (which is bad!) use sshfs. That's the fastes of the worse.


Feel free to send PR! I am happy to add them and provide them for the public.

I coudn't prove for 500.000 files - that given the same (single drive) hardware - ext4 was better or worse then zfs.
 
For comparing test on R730xd, 2x Xeon E5-2683 v4 @2.10GHz, 9x8TB HDD raid5s:
root@hxnode02:hxfs# 111#: ./create_random_chunks.py pbs
target dir: /hxfs/pbs/dummy-chunks
filesystem detected by stat(1): xfs
files to write: 500000
files to read/stat: 50000
buckets: 65536
sha256_name_generation: 0.64s
create_buckets: 2.17s
create_random_files: 33.21s
create_random_files_no_buckets: 34.50s
read_file_content_by_id: 19.25s
read_file_content_by_id_no_buckets: 13.94s
stat_file_by_id: 5.62s
stat_file_by_id_no_buckets: 1.20s
find_all_files: 14.96s
find_all_files_no_buckets: 0.43s
root@hxnode02:hxfs# 112#: df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/md125 62T 15T 47T 24% /hxfs

In question is why use drop_caches in script create_random_chunks.py which doesn't affect zfs arc which needs to export and re-import the pool to do the same ?!
Maybe even someone has zfs raidz or mirror numbers with hdd's ... :)

Edit: It's just a test filesystem to get 1 more hdd as we don't use raid5 for any production data.
 
Last edited:
In question is why use drop_caches in script create_random_chunks.py which doesn't affect zfs arc which needs to export and re-import the pool to do the same ?!
Maybe even someone has zfs raidz or mirror numbers with hdd's ... :)

drop_caches is used to simulate a worst case scenario

We drop the caches to avoid by all means that the operating system is caching filesystem interaction in the cache of the computer e.g. "the ram".

That will (in a real PBS Server) happen naturally when your write 1TB of data that it won't be cached.

Since by the nature I created the cache - we use only simple IDs in the dummy files. These files are only the length of a sha checksum string.

That is why we have to clear the cache in order - not to just measure the ram speed (as there will be no saturation of a 16gb - 32gb machine even with 500.000 files).


Edit: the numbers are great. I didn't see any difference with zfs vs. ext4 on native ssd disks. ZFS might have advantages - other the the filesystem speed that you want to use.


NFS and Samba are "the worst" you can have. Just don't use them.
 
Last edited:
I understand your drop_caches behavior as it will just even create less than 5GB but isn't useful when comparing against zfs.

Take second same server with 10Gb with nfs(3) to first one (just like a synology nas ...) and coming to:
root@hxnode03:mnt2# 111#: ./create_random_chunks.py pbs
target dir: pbs/dummy-chunks
filesystem detected by stat(1): nfs # remote
files to write: 500000
files to read/stat: 50000
buckets: 65536
sha256_name_generation: 0.65s
create_buckets: 26.31s
create_random_files: 461,45s
create_random_files_no_buckets: 489.65s
read_file_content_by_id: 24.32s
read_file_content_by_id_no_buckets: 18.84s
stat_file_by_id: 16.85s
stat_file_by_id_no_buckets: 10.35s
find_all_files: 74.44s
find_all_files_no_buckets: 0.52s
root@hxnode03:mnt2# 112#: df -h .
Filesystem Size Used Avail Use% Mounted on
hxnode02:/hxfs 62T 15T 47T 24% /mnt2 # remote
 
I understand your drop_caches behavior as it will just even create less than 5GB but isn't useful when comparing against zfs.

You need to read about that in the source code of the Linux Kernel.

I am calling a Linux Kernel /sys hook to drop all caches.

As far as I understood this - this is send to the VFS layer (that actually does the ram caching before the filesystem implementation) - at least that is the case for 20+ years.

Probably ZFS bypasses that - but you need to check that in the source code. I highly doubt that ZFS has "magic" here, it woudn't make any sense.

To be honest - I am not even sure if tmpfs does a VFS bypass (That is the only FS where it would make the most sense).

Having that in the code will enforce it (maybe it's not needed) - as far as I understood bonie++ doesn't do it - they just ensure they use files that are larger then the physical ram.
 
  • Like
Reactions: waltar
I would prefere to create those pbs real 4MB (?) chunk files for an amount of 4.5TB (instead of 6byte files for 4.5GB) and allow linux fs and arc cache,
yes, would take much longer but will show the effectivness of filesystem and caching mechanism, maybe implement as a second case. :)
 
I would prefere to create those pbs real 4MB (?) chunk files for an amount of 4.5TB (instead of 6byte files for 4.5GB) and allow linux fs and arc cache,
yes, would take much longer but will show the effectivness of filesystem and caching mechanism, maybe implement as a second case. :)

It doesn't make -1 sense.

I wrote this script to compare filesystems

Ext4, ZFS, NFS, Samba, SSHFS, iSCSI

My conclusion is - by wrting 500.000 (tiny files) over NFS and Samba It takes 500 seconds vial NFS/Samba and 10 seconds on local ext4.

Which is insane (!) 50x slower - on exfat it's even 500x slower.

(Again conditions artificial PBS server tasks - not making a "general" assumption about filesystems).

-> My takeaway - avoid nfs / smb (and of course exfat) like the plague.

-> My takeaway 2 - sshfs has a super performance with the (artificial PBS tasks) I tested

-> My takeaway 3 - use iSCSI for network support

....

Of course .. instead of using a 256 content of each of the 500.000 files - feel free to create a 5gb random file


... if you have the storage, time, fun doing so

10 sec vs. 500 sec vs 5000 sec (might not change in their proportion) - even if they get worse - what's the point?

If you want to test ext2 vs. zfs - feel free to do. With 500.000 files I found no (worth to deal with) differnce.



A hypothesis: zfs might be faster if you use a raid with 4-6-8 disks where you mix striping and mirroring.

That is beyond any use - you hardly write backups that create 500.000 chunks. At least I don't have such backups.

So if you drop your 500.000 writes/reads from 30sec to 25sec - you "might" have proven a point without any use.

Still it might be not the ZFS - because LVM with ext4 might keep up with that.

(Not talking - and check every post I did - about other advantages of ZFS vs. any other FS - this is about the performance).



Not to use a very shi*y nfs and smb - and tell the world that any tutorial for "how to use my Fritzbox Router SMB USB Stick with ExtFS" as a PBS storage is total garbage - that was the point of this script.
 
  • Like
Reactions: UdoB
Very old hardware

- A super bad old SSD
- A super old CPU

Still ext4 vs. zfs doesn't matter (we are creating / reading 500.000 files)

67 sec vs 85 sec
25 sec vs 25 sec

(the no buckets is not a PBS test - just added for pure fun)
45 sec vs 34 sec
14 sec vs 5 sec

(we are creating / reading 500.000 files)

Now have a look at the nfs
67 sec vs 6591 sec

Now have a look at the smbfs
67 sec vs 136370 sec

^^^ nfs / samfs is a pain


1730051794918.png
 
Haha, this is a kind of local and remote filesystem access benchmark, yes, and it shows expected results (local fast, remote protocoll slow)
but even then build the reality of pbs chunks.
In a engineering world 5TB data/day is not as much ...
Zfs arc is good for daily usage of up to 3 times the ram after that zfs is helpless searching for data.
One fileserver has to clamav virus scan every day 50M files of 60TB data which kills arc on every daily basis and shows it's bad inode handling when it's cannot handled any more by arc then as has only 192GB ram - again one of the gladly forgotten zfs features.
 
Haha, this is a kind of local and remote filesystem access benchmark, yes, and it shows expected results (local fast, remote protocoll slow)

Dude. I stop here.

You are wasting my time.

The instruction of my results contains a - very clear statement - that nfs, smb, iscsi, ... shares where used on a localhost mounted base ... and are NOT running via a network connection.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!