PBS with a volume >100 TB — how do you handle verify?

biowan1

New Member
Sep 11, 2025
5
3
3
Hello everyone,


I’m a new PBS user and would like some feedback based on your experience. I have a large backup storage system (RAID6) using a Dell PERC H965i RAID controller. One of my largest VMs is over 10 TB, and the verify duration has already exceeded 24 hours for this VM alone.

Is verify absolutely necessary on RAID6 storage, especially considering that the retention period is less than 6 months? How do you handle this on your side?

Thanks in advance for your feedback.
Best regards
b
 
Is verify absolutely necessary on RAID6 storage, especially considering that the retention period is less than 6 months? How do you handle this on your side?

HW Raid or ZFS RAIDz won't cover whether the backup is still consistent. So yes: Imho it's absolutely necessary.
What kind of storage are you using? HDDs, SSDs or a fusion pool ( HDDs with SSDs as special device in zfs)?

If you can't afford faster storage I would put the large vm in their own namespace and change the verify schedule. For example normal vms might get verified any day and the large vms every week or two weeks.
Edit: As UdoB said a mirror or striped mirror (aka RAID1/RAID10) of HDDs together with a SSD-based special device mirror should speed up the process at least a little bit.
 
Last edited:
  • Like
Reactions: biowan1 and UdoB
Assuming HDDs:
You mentioned Raid6. This gives you the IOPS of a single disk. A verify need to read the actual data, calculate the checksum and compare it with the stored original checksum. There is no single, large file to be read. Instead it needs to read some ten thousands of chunks, possibly distributed in millions of separately stored sectors (because of fragmentation) takes... a long time. Remember: there is physical head movement involved.

Technically a verify is not necessary. But you can only be sure to be able to read the data if you do exactly this once in a while. My personal choice is to re-verify every few months.

From my own (definitely limited) perspective the only acceptable construct uses pairs of mirrors and a fast + reliable(!) Special Device for the metadata - as I do use ZFS everywhere ;-)

Of course your-mileage-may-vary. But the massive duration is... expected - if my assumptions are right.
 
Let me share my story with quite large PBS datastores. We have 2 PBSs, one about 120TiB and one 150TiB. They are physical servers with 20 NVMe(15TB each) disks without any RAID. They are running ZFS Raidz2.

In the past few months Proxmox has improved process to verify backups. Now if you click Advanced you can tune how many thread will be used for reading and verifying. I have pushed this number to 16/32.

This approach has improved speed significantly(from 36 hours to about 12 hours). Nevertheless, I noticed that speed started decreasing a bit in past few weeks, disproportionally with amount of added backups. With use of Google/Gemini I came to an idea to implement a dedicated disk used just for L2ARC cache. So I have offloaded a lot of meta data from RAM. Oh yes, one appliance has 1TB of Ram and the other 768GB.

The caching disk is a super fast Optane disk that was just sitting on a shelf doing nothing for a while(384GB). I have used one Optane per PBS, and used it only for Metadata.

The GC and Verification times went down.
 
  • Like
Reactions: Onslow and biowan1
The caching disk is a super fast Optane disk that was just sitting on a shelf doing nothing for a while(384GB). I have used one Optane per PBS, and used it only for Metadata.
You are aware that if the metadata containing special device in a ZFS pool gets lost the whole pool is gone? For that reason the special device (aka metadata store) should be setup on a mirror.
 
  • Like
Reactions: biowan1
Aha sorry, I didn't got the part on the caching. If this works for your, that's great. I would still expect that a special device would give a further speedup.
 
  • Like
Reactions: biowan1
Aha sorry, I didn't got the part on the caching. If this works for your, that's great. I would still expect that a special device would give a further speedup.
Yes it would give even better performance. In that case I would need to use two disks and I am running low on empty disk slots. So for now I will use a single disk as caching disk.
 
Thank you for sharing all your feedback, experience, and proposed solutions. It’s reassuring to know that I’m not alone in this situation.

b
 
You can also configure in the verify job, that already verified backups get only re-verified after a certain time, e.G. after 7/14/30/more days. This way new backups will get verified but you don't need to reverify any old data in any verify job. If the backups contain business-critical data I would set the schedule, that everything is re-verified at least once a month or week, depending on risks one is willing to take.
 
  • Like
Reactions: UdoB
Hi Johannes S,

Thank you for your reply.

It seems to me that when running a verify + re-verify job, PBS lists all non-verified snapshots plus all snapshots whose verification has expired and queues them all for verification — not just the latest one. That’s at least what I’m observing.

In other words, if no verification runs during the week and the job is scheduled for the weekend, the verify job will end up verifying all snapshots created during the week.

Is there some special configuration option to change this behavior?

My understanding was that if you want to schedule verification of a specific snapshot, this can only be done via a system script — using proxmox-backup-manager and proxmox-backup-client to select a snapshot matching date-based criteria, retrieve its ID, and then verify it.

If there is a simpler built-in solution, I’d be very interested. Thanks
b
 
Nothing I'm aware off. In theory you could try to script something, but I doubt it's worth it.

The manual recommends to have at least a monthly verification to ensure that bitrot etc didn't alter the backups. So it makes sense to have a setting like re-verify after 30 days so everything older than 30 days gets added to the queue. The manual describes how to change the threads settings mentioned by robertlukan:
https://pbs.proxmox.com/docs/maintenance.html#verification
 
  • Like
Reactions: UdoB