Full datastore: Errors after 2nd GC even though all chunks were restored

vtr

New Member
Feb 24, 2025
10
0
1
Hi,

I ran into the issue that was already described here in other posts that I forgot to run garbage collection (I thought it's included in the prune process), ending up with a filled datastore. So much so that I could not even run GC.

Since I had no space to give back, I followed the advice of moving several folders with chunks to another disk, running GC (which took over a day - HDDs), then even during GC when some space was freed, moving the chunks back. The process finished with 96 warnings.

I was under the impression that a second GC-run should finish without errors, so I ran it right after the first GC was done. But it still complained about missing chunks, just the number of warnings changed - from 96 to 24.

That obviously worries me. Does this mean my backups are corrupted? Any advice on how to proceed from here?

PBS 4.0.14
 
I'm doing that. It's also a slow process. But it's generating a lot of errors already:

Code:
2026-04-23T11:07:55+02:00:   check drive-scsi3.img.fidx
2026-04-23T11:07:55+02:00: chunk 88957b02383f5ab530e65854611c9ad22d5f1694efe9e0adf5e814319b7bd755 was marked as corrupt
2026-04-23T11:07:55+02:00:   verified 0.00/0.00 MiB in 0.01 seconds, speed 0.00/0.00 MiB/s (1 errors)
2026-04-23T11:07:55+02:00: verify bpool:vm/100/2026-03-15T02:00:08Z/drive-scsi3.img.fidx failed: chunks could not be verified
2026-04-23T11:07:55+02:00:   check drive-scsi1.img.fidx
2026-04-23T11:07:55+02:00: chunk 88950ea9bd3844a2ea13966dbd736d180c307c9983d68fda7522f36a0207b0b8 was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88954b524d6544faba15c648abf1d003793478757799135ef22069dabf3faf1c was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88952f08381f42b9d0e0039768ce3d89bd1a9df7541843dfa58004f3dfb8d4ff was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88952a2268f49d6a501ba7fbab3dd4e3a831e4cd9b3da78a0adacc1fb2a36683 was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 8895546e4ddbfa7e5948756b2f93c939c565ef71187caed9d7a62a610e16ddf4 was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88955ddfa01c52830560ab9940067c6b55d1122534baa0940bed26ccd9f2505b was marked as corrupt
2026-04-23T11:07:55+02:00:   verified 0.00/0.00 MiB in 0.04 seconds, speed 0.00/0.00 MiB/s (6 errors)
2026-04-23T11:07:55+02:00: verify bpool:vm/100/2026-03-15T02:00:08Z/drive-scsi1.img.fidx failed: chunks could not be verified
 
was marked as corrupt just means that the real error happened earlier in the verification. Can you show them as well?

One thing I'd check is whether you copied the files back with the right permissions. Maybe something like ls -l | grep -v "backup backup" in the .chunks directory to see if any of the directories are not owned by that user+group.
 
Here is the whole log up to and including the lines I provided before

Code:
()
2026-04-23T10:08:54+02:00: Starting datastore verify job 'bpool:v-d50b17af-7536'
2026-04-23T10:08:54+02:00: verify datastore bpool
2026-04-23T10:08:54+02:00: found 32 groups
2026-04-23T10:08:54+02:00: verify group bpool:vm/100 (4 snapshots)
2026-04-23T10:08:54+02:00: verify bpool:vm/100/2026-04-23T01:00:10Z
2026-04-23T10:08:54+02:00:   check qemu-server.conf.blob
2026-04-23T10:08:54+02:00:   check fw.conf.blob
2026-04-23T10:08:54+02:00:   check drive-scsi3.img.fidx
2026-04-23T10:20:02+02:00: "can't verify chunk, load failed - store 'bpool', unable to load chunk '88957b02383f5ab530e65854611c9ad22d5f1694efe9e0adf5e814319b7bd755' - No such file or directory (os error 2)"
2026-04-23T10:20:02+02:00: failed to get s3 backend while trying to rename bad chunk: 88957b02383f5ab530e65854611c9ad22d5f1694efe9e0adf5e814319b7bd755
2026-04-23T10:20:02+02:00:   verified 57026.41/64192.00 MiB in 667.79 seconds, speed 85.40/96.13 MiB/s (1 errors)
2026-04-23T10:20:02+02:00: verify bpool:vm/100/2026-04-23T01:00:10Z/drive-scsi3.img.fidx failed: chunks could not be verified
2026-04-23T10:20:02+02:00:   check drive-scsi1.img.fidx
2026-04-23T11:06:27+02:00: "can't verify chunk, load failed - store 'bpool', unable to load chunk '8895546e4ddbfa7e5948756b2f93c939c565ef71187caed9d7a62a610e16ddf4' - No such file or directory (os error 2)"
2026-04-23T11:06:27+02:00: failed to get s3 backend while trying to rename bad chunk: 8895546e4ddbfa7e5948756b2f93c939c565ef71187caed9d7a62a610e16ddf4
2026-04-23T11:06:27+02:00: "can't verify chunk, load failed - store 'bpool', unable to load chunk '88950ea9bd3844a2ea13966dbd736d180c307c9983d68fda7522f36a0207b0b8' - No such file or directory (os error 2)"
2026-04-23T11:06:27+02:00: failed to get s3 backend while trying to rename bad chunk: 88950ea9bd3844a2ea13966dbd736d180c307c9983d68fda7522f36a0207b0b8
2026-04-23T11:06:27+02:00: "can't verify chunk, load failed - store 'bpool', unable to load chunk '88955ddfa01c52830560ab9940067c6b55d1122534baa0940bed26ccd9f2505b' - No such file or directory (os error 2)"
2026-04-23T11:06:27+02:00: failed to get s3 backend while trying to rename bad chunk: 88955ddfa01c52830560ab9940067c6b55d1122534baa0940bed26ccd9f2505b
2026-04-23T11:06:27+02:00: "can't verify chunk, load failed - store 'bpool', unable to load chunk '88952a2268f49d6a501ba7fbab3dd4e3a831e4cd9b3da78a0adacc1fb2a36683' - No such file or directory (os error 2)"
2026-04-23T11:06:27+02:00: failed to get s3 backend while trying to rename bad chunk: 88952a2268f49d6a501ba7fbab3dd4e3a831e4cd9b3da78a0adacc1fb2a36683
2026-04-23T11:06:27+02:00: "can't verify chunk, load failed - store 'bpool', unable to load chunk '88954b524d6544faba15c648abf1d003793478757799135ef22069dabf3faf1c' - No such file or directory (os error 2)"
2026-04-23T11:06:27+02:00: failed to get s3 backend while trying to rename bad chunk: 88954b524d6544faba15c648abf1d003793478757799135ef22069dabf3faf1c
2026-04-23T11:06:27+02:00: "can't verify chunk, load failed - store 'bpool', unable to load chunk '88952f08381f42b9d0e0039768ce3d89bd1a9df7541843dfa58004f3dfb8d4ff' - No such file or directory (os error 2)"
2026-04-23T11:06:27+02:00: failed to get s3 backend while trying to rename bad chunk: 88952f08381f42b9d0e0039768ce3d89bd1a9df7541843dfa58004f3dfb8d4ff
2026-04-23T11:06:27+02:00:   verified 276386.46/375888.00 MiB in 2785.23 seconds, speed 99.23/134.96 MiB/s (6 errors)
2026-04-23T11:06:27+02:00: verify bpool:vm/100/2026-04-23T01:00:10Z/drive-scsi1.img.fidx failed: chunks could not be verified
2026-04-23T11:06:27+02:00:   check drive-scsi0.img.fidx
2026-04-23T11:07:55+02:00:   verified 8835.86/15748.00 MiB in 87.95 seconds, speed 100.47/179.06 MiB/s (0 errors)
2026-04-23T11:07:55+02:00: percentage done: 0.78% (0/32 groups, 1/4 snapshots in group #1)
2026-04-23T11:07:55+02:00: verify bpool:vm/100/2026-04-22T01:00:03Z
2026-04-23T11:07:55+02:00:   check qemu-server.conf.blob
2026-04-23T11:07:55+02:00:   check fw.conf.blob
2026-04-23T11:07:55+02:00:   check drive-scsi3.img.fidx
2026-04-23T11:07:55+02:00: chunk 88957b02383f5ab530e65854611c9ad22d5f1694efe9e0adf5e814319b7bd755 was marked as corrupt
2026-04-23T11:07:55+02:00:   verified 0.00/0.00 MiB in 0.01 seconds, speed 0.00/0.00 MiB/s (1 errors)
2026-04-23T11:07:55+02:00: verify bpool:vm/100/2026-04-22T01:00:03Z/drive-scsi3.img.fidx failed: chunks could not be verified
2026-04-23T11:07:55+02:00:   check drive-scsi1.img.fidx
2026-04-23T11:07:55+02:00: chunk 88950ea9bd3844a2ea13966dbd736d180c307c9983d68fda7522f36a0207b0b8 was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88954b524d6544faba15c648abf1d003793478757799135ef22069dabf3faf1c was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88952f08381f42b9d0e0039768ce3d89bd1a9df7541843dfa58004f3dfb8d4ff was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88952a2268f49d6a501ba7fbab3dd4e3a831e4cd9b3da78a0adacc1fb2a36683 was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 8895546e4ddbfa7e5948756b2f93c939c565ef71187caed9d7a62a610e16ddf4 was marked as corrupt
2026-04-23T11:07:55+02:00: chunk 88955ddfa01c52830560ab9940067c6b55d1122534baa0940bed26ccd9f2505b was marked as corrupt
2026-04-23T11:07:55+02:00:   verified 0.00/0.00 MiB in 0.04 seconds, speed 0.00/0.00 MiB/s (6 errors)
2026-04-23T11:07:55+02:00: verify bpool:vm/100/2026-04-22T01:00:03Z/drive-scsi1.img.fidx failed: chunks could not be verified

And here the ls -l command output

Code:
root@ppbss2:/mnt/datastore/bpool/.chunks# ls -l | grep -v "backup backup"
total 2082992

# even tried
root@ppbss2:/mnt/datastore/bpool/.chunks# find . \! \( -user backup -a -group backup \)

So, there should be no permission issues.
 
Check (below .chunks directory - in the subdirectory which name is like the beginning of the chunk name) whether there are those filenames. Also, they might get ".bad" suffix.
It happened to me that some chunks were renamed ....bad (particularly due to RAM errors, which was revealed later) and manual renaming to the original names (removing .bad suffix) fixed the problem.
If they are really corrupted, the following verification will mark them as corrupted.
 
Could you provide the output of proxmox-backup-manager datastore show ...?

If this is an s3 datastore then you should also verify that the backend is working with proxmox-backup-manager s3 check ...

You are running an older version and those error messages have changed a bit in the meantime. Updating is probably a good idea, but if those chunks are really missing, then your backups are partially corrupt.

If there are .bad files you should be able to verify the checksum with proxmox-backup-debug inspect chunkas long as you specify --decode [1]

[1] https://forum.proxmox.com/threads/bad-chunks-that-arent-bad.127677/
 
@vtr In my previous post I forgot that the suffix also contains .number at the left side of the .bad

Anyway, you may want to not delete backups even if some chunks are corrupted.
If you later need to restore some files from those backups, there is a good chance that the needed files are in not corrupted chunks so you'll be able to restore the files. I know it from experience :) .
 
Could you provide the output of proxmox-backup-manager datastore show ...?
Code:
┌─────────────┬──────────────────────┐
│ Name        │ Value                │
╞═════════════╪══════════════════════╡
│ name        │ bpool                │
├─────────────┼──────────────────────┤
│ path        │ /mnt/datastore/bpool │
├─────────────┼──────────────────────┤
│ gc-schedule │ 05:00                │
└─────────────┴──────────────────────┘

If this is an s3 datastore
It's not.

You are running an older version
PBS 4.0.14 - but yes, I'll update soon.

If there are .bad files you should be able to verify the checksum
I'lll verify, thanks.
 
Check (below .chunks directory - in the subdirectory which name is like the beginning of the chunk name) whether there are those filenames. Also, they might get ".bad" suffix.
It happened to me that some chunks were renamed ....bad (particularly due to RAM errors, which was revealed later) and manual renaming to the original names (removing .bad suffix) fixed the problem.
If they are really corrupted, the following verification will mark them as corrupted.
Ok, that sounds like something that could work. Thanks, I'll try this!
 
@vtr
One more important remark. Don't run garbage collection before you have managed to rename "bad" chunks to their original names (if you want to preserve these chunks).
Because files with ".bad" suffixes are NOT referenced in the index files, the GC will delete them.

Unintended deleting might also occur when the chunks were not present during the first phase of GC and were present during the second phase. Because GC didn't update their access time (atime) which is done during first phase.

See https://pbs.proxmox.com/docs/maintenance.html#gc-background

I'm mentioning this because you wrote in the first post:
"during GC when some space was freed, [you were] moving the chunks back."
 
Don't run garbage collection before you have managed to rename "bad" chunks to their original names (if you want to preserve these chunks).
Haven't I done this though already? I ran GC a few hours after the first run which, as documented, gave back 24 errors instead of 96 in the first run. Also, while the first GC was running, new backups were made...


Unintended deleting might also occur when the chunks were not present during the first phase of GC and were present during the second phase.
Isn't this what is recommended in https://forum.proxmox.com/threads/disk-full-unable-to-run-garbage-collection.81800/, though?


But in either case, running find . -name *.bad within the .chunks folder came back empty :(
 
Haven't I done this though already?
I don't know what _exactly_ you have done (and stages - phases matter) :cool: .

I ran GC a few hours after the first run which, as documented, gave back 24 errors instead of 96 in the first run. Also, while the first GC was running, new backups were made...
New backups while GC are not endangered (this is also described in the doc I linked to).


"This" - I'm not sure what you mean. But I can't see it at the above link. If you do, please quote it here :).

But in either case, running find . -name *.bad within the .chunks folder came back empty :(

Note that using metacharacters (e.g. *) without quotation in similar commands may cause wrong results because shell expansion happens.
 
Note that using metacharacters (e.g. *) without quotation in similar commands may cause wrong results because shell expansion happens.

You're right, thanks. I ran the command again using find . -name "*.bad" , same result.

"This" - I'm not sure what you mean. But I can't see it at the above link. If you do, please quote it here :).

The most liked comment at the end stated:

On resolution, like recommended I moved some chunks off, ran gc with warnings, moved the chunks back, re-ran gc cleanly, started validation.

I was under the impression this is just what I did. I moved some directories within .chunks to my root disk, then, as soon as GC started and freed sufficient space, moved them back, using the mv command.

New backups while GC are not endangered (this is also described in the doc I linked to).
Thanks for confirming. I'll have to more carefully read the doc.
I don't know what _exactly_ you have done (and stages - phases matter) :cool: .
I'll try to explain. Once I noticed that my datastore was full, I moved some directories in .chunks to my root disk, as mentioned above, started GC, then - during GC - moved all these chunks back and waited for GC to finish. This took about 24 hours because of HDDs and gave back 96 warnings. Just a couple of hours later, I ran GC again. It was much quicker this time but still gave back 24 warnings. Then, based on your recommendation, I ran verify. And then I ran the checks we discussed so far. That's pretty much it.

BTW, thanks for sticking with me, I appreciate it.
 
Last edited:
In the verification log, I took up one random line that read

2026-04-23T18:39:28+02:00: chunk fa6b4b4d9503177907b622b6a877a5fadba2214bf9f1c7b7f6d20ea3be926517 was marked as corrupt

So, I looked for a chunk with that name. It exists within the directory fa6b. There are no files with an ending .bad there (or any ending, for that matter).

Code:
/mnt/datastore/bpool/.chunks/fa6b# ls -l
total 14266
-rw-r--r-- 1 backup backup 3701546 Apr 23 22:06 fa6b4b4d9503177907b622b6a877a5fadba2214bf9f1c7b7f6d20ea3be926517
-rw-r--r-- 1 backup backup 2579054 Apr 22 23:06 fa6b506992c4b62823e8c93e6aa2b1eeea9395fc488c58d90b2df7050dca9009
-rw-r--r-- 1 backup backup 4194348 Apr 22 22:31 fa6bb11633ee9bd4b54954aac3db06f299f5546edf5786be1aae28aab7ef1d99
-rw-r--r-- 1 backup backup 4090155 Apr 22 23:39 fa6bd00d1aa752790504d58cc4b964bbed6203d3db418b9c88d7d0e09509a838