Race between "Prune & GC" and new backups leads to corrupt dumps

NdK73

Renowned Member
Jul 19, 2012
97
6
73
Bologna, Italy
www.csshl.net
Hello.

Seems the job for prune & GC does not account for data generated by concurrent backups.

In my case GC takes 17 days (on 24 16TB SAS disks configured as a single pool containing 3 RAIDZ3 - it's simply economically unfeasible to have the same amount of storage on SSD just for backups!) and all the backups run during that interval are corrupted (IIUC GC removes some chunks that they see as still available).
Even worse: backups run AFTER the GC completes are often corrupted, too. Unless I delete the last corrupted backup.

Please consider having an option to completely avoid chunking, leaving only ZFS block-level deduplication (IIF the user keeps it enabled... I really don't like the idea of deduplicating a thing that gets done exactly to have redundancy!). Call it "pseudo-tape mode"? :)
 
Hello.

Seems the job for prune & GC does not account for data generated by concurrent backups.

In my case GC takes 17 days (on 24 16TB SAS disks configured as a single pool containing 3 RAIDZ3 - it's simply economically unfeasible to have the same amount of storage on SSD just for backups!) and all the backups run during that interval are corrupted (IIUC GC removes some chunks that they see as still available).
Even worse: backups run AFTER the GC completes are often corrupted, too. Unless I delete the last corrupted backup.

Please consider having an option to completely avoid chunking, leaving only ZFS block-level deduplication (IIF the user keeps it enabled... I really don't like the idea of deduplicating a thing that gets done exactly to have redundancy!). Call it "pseudo-tape mode"? :)
Hi,
garbage collection will take into account the oldest active writer instance and take this into account to calculate the cutoff time for chunks to be deletable.

So please do share:
  • The proxmox-backup-manager version --verbose
  • The full task logs for the concurrent backups
  • The full task log for the garbage collection
  • Make sure that the previous backup is fully verified before running a new one if using the fast incremental mode
See also: https://bugzilla.proxmox.com/show_bug.cgi?id=5710
 
Last edited:
Here it is. But packages got updated yesterday, after fixing the backups.

Code:
root@pbs:~# proxmox-backup-manager version --verbose
proxmox-backup                     3.2.0        running kernel: 5.15.131-2-pve
proxmox-backup-server              3.2.7-1      running version: 3.2.7       
proxmox-kernel-helper              8.1.0                                     
pve-kernel-5.15                    7.4-15                                     
proxmox-kernel-6.8                 6.8.12-2                                   
proxmox-kernel-6.8.12-2-pve-signed 6.8.12-2                                   
pve-kernel-5.15.158-2-pve          5.15.158-2                                 
pve-kernel-5.15.131-2-pve          5.15.131-3                                 
pve-kernel-5.15.126-1-pve          5.15.126-1                                 
pve-kernel-5.15.116-1-pve          5.15.116-1                                 
pve-kernel-5.15.108-1-pve          5.15.108-2                                 
pve-kernel-5.15.107-2-pve          5.15.107-2                                 
pve-kernel-5.15.83-1-pve           5.15.83-1                                 
pve-kernel-5.15.74-1-pve           5.15.74-1                                 
ifupdown2                          3.2.0-1+pmx9                               
libjs-extjs                        7.0.0-4                                   
proxmox-backup-docs                3.2.7-1                                   
proxmox-backup-client              3.2.7-1                                   
proxmox-mail-forward               0.2.3                                     
proxmox-mini-journalreader         1.4.0                                     
proxmox-offline-mirror-helper      unknown                                   
proxmox-widget-toolkit             4.2.3                                     
pve-xtermjs                        5.3.0-3                                   
smartmontools                      7.3-pve1                                   
zfsutils-linux                     2.2.6-pve1

This is the log of one of the backups that failed verification:
Code:
2024-09-29T01:01:37+02:00: starting new backup on datastore 'difa-bck': "ns/OPH/vm/156/2024-09-28T23:01:36Z"
2024-09-29T01:01:37+02:00: download 'index.json.blob' from previous backup.
2024-09-29T01:01:37+02:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: download 'drive-scsi0.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: created new fixed index 1 ("ns/OPH/vm/156/2024-09-28T23:01:36Z/drive-scsi0.img.fidx")
2024-09-29T01:01:37+02:00: register chunks in 'drive-scsi1.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: download 'drive-scsi1.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: created new fixed index 2 ("ns/OPH/vm/156/2024-09-28T23:01:36Z/drive-scsi1.img.fidx")
2024-09-29T01:01:37+02:00: add blob "/mnt/datastore/difa-bck/ns/OPH/vm/156/2024-09-28T23:01:36Z/qemu-server.conf.blob" (568 bytes, comp: 568)
2024-09-29T01:01:37+02:00: add blob "/mnt/datastore/difa-bck/ns/OPH/vm/156/2024-09-28T23:01:36Z/fw.conf.blob" (42 bytes, comp: 42)
2024-09-29T01:01:40+02:00: Upload statistics for 'drive-scsi1.img.fidx'
2024-09-29T01:01:40+02:00: UUID: c20468f3429a431a897fed7ebf00f72a
2024-09-29T01:01:40+02:00: Checksum: 503f391ca4255b307dc7c42474effe92780e8425535f5385599b31c524c36274
2024-09-29T01:01:40+02:00: Size: 41943040
2024-09-29T01:01:40+02:00: Chunk count: 10
2024-09-29T01:01:40+02:00: Upload size: 46137344 (110%)
2024-09-29T01:01:40+02:00: Duplicates: 0+1 (10%)
2024-09-29T01:01:40+02:00: Compression: 70%
2024-09-29T01:01:40+02:00: successfully closed fixed index 2
2024-09-29T01:01:43+02:00: Upload statistics for 'drive-scsi0.img.fidx'
2024-09-29T01:01:43+02:00: UUID: 3cfad824cc124b2480d4856e37b45e78
2024-09-29T01:01:43+02:00: Checksum: d5eb49224c89d8f9d9e798734531ae43e710429dcdf2880297600dfb659f637b
2024-09-29T01:01:43+02:00: Size: 415236096
2024-09-29T01:01:43+02:00: Chunk count: 99
2024-09-29T01:01:43+02:00: Upload size: 406847488 (97%)
2024-09-29T01:01:43+02:00: Duplicates: 2+1 (3%)
2024-09-29T01:01:43+02:00: Compression: 25%
2024-09-29T01:01:43+02:00: successfully closed fixed index 1
2024-09-29T01:01:43+02:00: add blob "/mnt/datastore/difa-bck/ns/OPH/vm/156/2024-09-28T23:01:36Z/index.json.blob" (428 bytes, comp: 428)
2024-09-29T01:01:44+02:00: successfully finished backup
2024-09-29T01:01:44+02:00: backup finished successfully
2024-09-29T01:01:44+02:00: TASK OK

Last GC log is:
Code:
2024-09-07T02:00:00+02:00: starting garbage collection on store difa-bck
2024-09-07T02:00:00+02:00: task triggered by schedule 'sat *-1..7 02:00'
2024-09-07T02:00:00+02:00: Start GC phase1 (mark used chunks)
2024-09-08T08:52:34+02:00: marked 1% (7 of 643 index files)
2024-09-10T07:10:21+02:00: marked 2% (13 of 643 index files)
[...]
2024-09-24T11:06:36+02:00: marked 98% (631 of 643 index files)
2024-09-24T11:06:37+02:00: marked 99% (637 of 643 index files)
2024-09-24T11:06:39+02:00: marked 100% (643 of 643 index files)
2024-09-24T11:06:39+02:00: Start GC phase2 (sweep unused chunks)
2024-09-24T11:12:17+02:00: processed 1% (327924 chunks)
2024-09-24T11:17:04+02:00: processed 2% (654706 chunks)
2024-09-24T11:21:28+02:00: processed 3% (981548 chunks)
2024-09-24T11:25:28+02:00: processed 4% (1308214 chunks)
[...]
2024-09-24T16:20:15+02:00: processed 97% (31729290 chunks)
2024-09-24T16:23:26+02:00: processed 98% (32057492 chunks)
2024-09-24T16:26:36+02:00: processed 99% (32384733 chunks)
2024-09-24T16:29:46+02:00: Removed garbage: 602.015 GiB
2024-09-24T16:29:46+02:00: Removed chunks: 448780
2024-09-24T16:29:46+02:00: Pending removals: 15.978 GiB (in 12557 chunks)
2024-09-24T16:29:46+02:00: Original data usage: 824.648 TiB
2024-09-24T16:29:46+02:00: On-Disk usage: 74.832 TiB (9.07%)
2024-09-24T16:29:46+02:00: On-Disk chunks: 32250873
2024-09-24T16:29:46+02:00: Deduplication factor: 11.02
2024-09-24T16:29:46+02:00: Average chunk size: 2.433 MiB
2024-09-24T16:29:46+02:00: TASK OK

The log of the last verification is too big to be attached :(
 
Here it is. But packages got updated yesterday, after fixing the backups.

Code:
root@pbs:~# proxmox-backup-manager version --verbose
proxmox-backup                     3.2.0        running kernel: 5.15.131-2-pve
proxmox-backup-server              3.2.7-1      running version: 3.2.7      
proxmox-kernel-helper              8.1.0                                    
pve-kernel-5.15                    7.4-15                                    
proxmox-kernel-6.8                 6.8.12-2                                  
proxmox-kernel-6.8.12-2-pve-signed 6.8.12-2                                  
pve-kernel-5.15.158-2-pve          5.15.158-2                                
pve-kernel-5.15.131-2-pve          5.15.131-3                                
pve-kernel-5.15.126-1-pve          5.15.126-1                                
pve-kernel-5.15.116-1-pve          5.15.116-1                                
pve-kernel-5.15.108-1-pve          5.15.108-2                                
pve-kernel-5.15.107-2-pve          5.15.107-2                                
pve-kernel-5.15.83-1-pve           5.15.83-1                                
pve-kernel-5.15.74-1-pve           5.15.74-1                                
ifupdown2                          3.2.0-1+pmx9                              
libjs-extjs                        7.0.0-4                                  
proxmox-backup-docs                3.2.7-1                                  
proxmox-backup-client              3.2.7-1                                  
proxmox-mail-forward               0.2.3                                    
proxmox-mini-journalreader         1.4.0                                    
proxmox-offline-mirror-helper      unknown                                  
proxmox-widget-toolkit             4.2.3                                    
pve-xtermjs                        5.3.0-3                                  
smartmontools                      7.3-pve1                                  
zfsutils-linux                     2.2.6-pve1

This is the log of one of the backups that failed verification:
Code:
2024-09-29T01:01:37+02:00: starting new backup on datastore 'difa-bck': "ns/OPH/vm/156/2024-09-28T23:01:36Z"
2024-09-29T01:01:37+02:00: download 'index.json.blob' from previous backup.
2024-09-29T01:01:37+02:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: download 'drive-scsi0.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: created new fixed index 1 ("ns/OPH/vm/156/2024-09-28T23:01:36Z/drive-scsi0.img.fidx")
2024-09-29T01:01:37+02:00: register chunks in 'drive-scsi1.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: download 'drive-scsi1.img.fidx' from previous backup.
2024-09-29T01:01:37+02:00: created new fixed index 2 ("ns/OPH/vm/156/2024-09-28T23:01:36Z/drive-scsi1.img.fidx")
2024-09-29T01:01:37+02:00: add blob "/mnt/datastore/difa-bck/ns/OPH/vm/156/2024-09-28T23:01:36Z/qemu-server.conf.blob" (568 bytes, comp: 568)
2024-09-29T01:01:37+02:00: add blob "/mnt/datastore/difa-bck/ns/OPH/vm/156/2024-09-28T23:01:36Z/fw.conf.blob" (42 bytes, comp: 42)
2024-09-29T01:01:40+02:00: Upload statistics for 'drive-scsi1.img.fidx'
2024-09-29T01:01:40+02:00: UUID: c20468f3429a431a897fed7ebf00f72a
2024-09-29T01:01:40+02:00: Checksum: 503f391ca4255b307dc7c42474effe92780e8425535f5385599b31c524c36274
2024-09-29T01:01:40+02:00: Size: 41943040
2024-09-29T01:01:40+02:00: Chunk count: 10
2024-09-29T01:01:40+02:00: Upload size: 46137344 (110%)
2024-09-29T01:01:40+02:00: Duplicates: 0+1 (10%)
2024-09-29T01:01:40+02:00: Compression: 70%
2024-09-29T01:01:40+02:00: successfully closed fixed index 2
2024-09-29T01:01:43+02:00: Upload statistics for 'drive-scsi0.img.fidx'
2024-09-29T01:01:43+02:00: UUID: 3cfad824cc124b2480d4856e37b45e78
2024-09-29T01:01:43+02:00: Checksum: d5eb49224c89d8f9d9e798734531ae43e710429dcdf2880297600dfb659f637b
2024-09-29T01:01:43+02:00: Size: 415236096
2024-09-29T01:01:43+02:00: Chunk count: 99
2024-09-29T01:01:43+02:00: Upload size: 406847488 (97%)
2024-09-29T01:01:43+02:00: Duplicates: 2+1 (3%)
2024-09-29T01:01:43+02:00: Compression: 25%
2024-09-29T01:01:43+02:00: successfully closed fixed index 1
2024-09-29T01:01:43+02:00: add blob "/mnt/datastore/difa-bck/ns/OPH/vm/156/2024-09-28T23:01:36Z/index.json.blob" (428 bytes, comp: 428)
2024-09-29T01:01:44+02:00: successfully finished backup
2024-09-29T01:01:44+02:00: backup finished successfully
2024-09-29T01:01:44+02:00: TASK OK

Last GC log is:
Code:
2024-09-07T02:00:00+02:00: starting garbage collection on store difa-bck
2024-09-07T02:00:00+02:00: task triggered by schedule 'sat *-1..7 02:00'
2024-09-07T02:00:00+02:00: Start GC phase1 (mark used chunks)
2024-09-08T08:52:34+02:00: marked 1% (7 of 643 index files)
2024-09-10T07:10:21+02:00: marked 2% (13 of 643 index files)
[...]
2024-09-24T11:06:36+02:00: marked 98% (631 of 643 index files)
2024-09-24T11:06:37+02:00: marked 99% (637 of 643 index files)
2024-09-24T11:06:39+02:00: marked 100% (643 of 643 index files)
2024-09-24T11:06:39+02:00: Start GC phase2 (sweep unused chunks)
2024-09-24T11:12:17+02:00: processed 1% (327924 chunks)
2024-09-24T11:17:04+02:00: processed 2% (654706 chunks)
2024-09-24T11:21:28+02:00: processed 3% (981548 chunks)
2024-09-24T11:25:28+02:00: processed 4% (1308214 chunks)
[...]
2024-09-24T16:20:15+02:00: processed 97% (31729290 chunks)
2024-09-24T16:23:26+02:00: processed 98% (32057492 chunks)
2024-09-24T16:26:36+02:00: processed 99% (32384733 chunks)
2024-09-24T16:29:46+02:00: Removed garbage: 602.015 GiB
2024-09-24T16:29:46+02:00: Removed chunks: 448780
2024-09-24T16:29:46+02:00: Pending removals: 15.978 GiB (in 12557 chunks)
2024-09-24T16:29:46+02:00: Original data usage: 824.648 TiB
2024-09-24T16:29:46+02:00: On-Disk usage: 74.832 TiB (9.07%)
2024-09-24T16:29:46+02:00: On-Disk chunks: 32250873
2024-09-24T16:29:46+02:00: Deduplication factor: 11.02
2024-09-24T16:29:46+02:00: Average chunk size: 2.433 MiB
2024-09-24T16:29:46+02:00: TASK OK

The log of the last verification is too big to be attached :(
So the backup uses the fast incremental mode. Please do a full verification of at least the last backup snapshot of each backup group (better of all without the skip verified flag), as I suspect that the previous backup used by the fast incremental mode is already corrupt.

If the previous snapshot is marked as corrupt after verification, the next backup run will not perform an incremental backup but rather re-upload all chunks to the server. That might help to "heal" some of the corrupted snapshots, to be checked by a subsequent verification of previously corrupt backup snapshots.
 
I sometimes click the 'v' next t othe pencil for the latest backup and it gets marked 'ok'. At next gc I'll see if the verified ones become corrupt or not.
But it seems pbs does not handle well big backups (like the one containing a 32TB disk): nearly 3 days for a complete backup, way more for verification :(
 
I would HEAVILY recommend special vdev for the metadata on your pool, raidz is itself really slow and gc jobs are basically just metadata operations, so they can leverage the special vdev ssd:s
 
Tried looking at it but IIUC I'd need 3 new vdevs with the same raidz3 layout just for metadata... One SSD can probably be added, the second could even be hammered in, but 3 or more is not possible and 18 would be plain ridiculous.
 
IIUC your setup, you just need one vdev as a mirror of 2 or maybe 3 SSD drives and add it as special device to your single pool of 3 raidz3 vdevs. Those SSD should be big enough for all the metadata to fit in. As a rough estimation 1TB would be barely enough (~66TB PBS datastore uses ~192GB with special_small_blocks 4k and ~252GB with special_small_blocks 16K), so I would definitely go for 2TB disks.

A few things to consider:
- If you special device fills up, small blocks will spill to your HDDs.
- Only new small blocks will be written to the special device, so to make use of it you will have to rewrite all data (i.e. add another datastore in the same dataset of in a new dataset of the same zpool and using PBS sync to move backups to the new datastore. I know, that's a lot of work for already overloaded disks...).
- Special device only benefits GC, not verify and barely improves backup / restore performance. It does improve browsing the datastore, tough.
- With special device your GC will take hours not days. An average of all my PBS with very varied hardware setups is like ~15minutes per 10TB in the datastore. That that with some salt as many factors will change the time needed.
 
  • Like
Reactions: itNGO

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!