It is safe to run multiple re-verification jobs at on same PBS datastore?

daan99

Renowned Member
Nov 20, 2015
6
0
66
We have verification on all new snapshots, but we want to re-verify backups more often.
The problem is that verification job is doing it one by one, can we do it in parallel by multiply jobs on the same datastore?

PBS Storage bellow is on XFS/mdraid with nvme drives, with read about 80GB/s, and PBS server have 64-cores.
Btw we tested ZFS many times, but mdraid with RAID6 on 24 drives is way more faster than ZFS, and have way more less CPU usage.
Bit rot is not a problem, because we using nvme drives with OCP standards, and they have protection and correction on all data-path.
We want this for eventual Ransomware Detection, and to do this backup must be verify.
 
Are you that sure and not measured filesystem cache memory bandwidth ? You shoud take a look at "iostat -xm 1" by reading your data and the performance should be reached for minimum 30s to be defined as stable.
And if you get that really you could run multiple verification jobs same time but you would I'm really be sure will have a cpu/core performance limit then.
 
For ransomware protection I would simply lock down PBS with restricted ACL controls. Don't use root to access unless you have to. I've set my PVE hosts with a backup role only on PBS so it can do the usual backups but PVEs can not delete any of the VMs or modify the previous backups.

You can also sync the backups to another PBS with extra long retention policies to give you extra protection.

Verification jobs on the PBS just going to take too long to give you any kind of warnings of problems. Especially if ransomware somehow compromised the PBS server.

Tape backups wouldn't hurt either.
 
Are you that sure and not measured filesystem cache memory bandwidth ? You shoud take a look at "iostat -xm 1" by reading your data and the performance should be reached for minimum 30s to be defined as stable.
And if you get that really you could run multiple verification jobs same time but you would I'm really be sure will have a cpu/core performance limit then.
Test files for fio on XFS are 32x larger than server memory.
When we run tests , we see this on iostat, fio tests was runned over 5 minutes and more.
Multiple process can go to 80GB/s with random 1-4M reads. Single process get up to 15GB/s depends on what flags was used.

1723743553773.png

Write is more than 10x less, but this is RAID6, event with stripe cache, and threads count optimization.
There are other things to optimize like RAID chunk and stripe size with 24 drives, but write is not a case.

So the question was, it is safe to run multiple verification job on same pbs storage. One job is hitting one 4 cores at about 90-100%, but read about 0.8G/s , and there is resources left.
 
For ransomware protection I would simply lock down PBS with restricted ACL controls. Don't use root to access unless you have to. I've set my PVE hosts with a backup role only on PBS so it can do the usual backups but PVEs can not delete any of the VMs or modify the previous backups.

You can also sync the backups to another PBS with extra long retention policies to give you extra protection.

Verification jobs on the PBS just going to take too long to give you any kind of warnings of problems. Especially if ransomware somehow compromised the PBS server.

Tape backups wouldn't hurt either.
Yes, tapes, ACL, firewall, manual checks and other things is obvious. But if we can verify backups more frequently, why not do it:)
 
The problem is that verification job is doing it one by one, can we do it in parallel by multiply jobs on the same datastore?
You could but it won't help in this case unless you have namespaces.

When the verify runs it does not set a lock in the snapshot that it is currently verifying, so if you start two verify jobs at once, both will verify the same snapshots at the same time. You'll be using double resources for the same task and take probably even longer. If using namespaces, you can set a verify for each one and potentially reduce the time needed to check all snapshots.
 
You could but it won't help in this case unless you have namespaces.

When the verify runs it does not set a lock in the snapshot that it is currently verifying, so if you start two verify jobs at once, both will verify the same snapshots at the same time. You'll be using double resources for the same task and take probably even longer. If using namespaces, you can set a verify for each one and potentially reduce the time needed to check all snapshots.
Hi, thanks for information about locking, I must check way to run verify on single snapshot/backup. Then i will write script to run this in parallel without overlap.
 
AFAIK it can be done using API [1], but haven't found how to do it with proxmox-backup-manager. Keep in mind that doing verify of each snapshot in different verify tasks will make shared chunks to be verified multiple times (once per verify task). Each running verify task keeps track of the chunks if has verified in order to check any given chunk affected by the task only once to reduce the total processing time.

[1] https://pbs.proxmox.com/docs/api-viewer/index.html#/admin/datastore/{store}/verify
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!