Huge differences in reported usage on two PBS instances (and very large .chunks directory)

Razva

Renowned Member
Dec 3, 2013
260
11
83
Romania
cncted.com
Hello,

Three months ago I've moved everything to a new PBS server and set an S3 provider as datastore. I see no errors being generated, everything seems to be fine, so I'm thinking to decommission the old server. On the new server, the .chunks directory size is 1TB. Both PBS and the S3 provider report 1TB of usage.

What worries me is that the old server has no less than 6TB in the .chunks directory. No backups were executed in the last 3 months. I've executed garbage collection and pruning tasks, but the size of the .chunks directory remains the same.

Retention policies are identical for both PBS instances: 7x Daily, 4x Weekly. There were no huge changes made inside the VMs, there are practically the same VMs running for 2 years.

Could you please let me know what could be the root cause of this inconsistency?

Thank you!
 
Last edited:
I'm thinking to decommission the old server. The .chunks directory size is 1TB. Both PBS and the S3 provider report 1TB of usage.

What worries me is that the old server has no less than 6TB in the .chunks directory.
Hi, @Razva .
I don't understand. 1 TB or 6 TB?
And what, exactly, (what commands) do show these sizes?
 
Hey. I've updated the thread so it's more clear.

The new server reports 1TB while the old server reports 6TB.

I'm using `df` to check physical usage, and `Datastore -> Summary" in the PBS UI.
 
df reports the filesystem usage. Depending on the situation, comparing one server's with the second server's fs usage may be appropriate or no.

Better view of the situation could be got with du -hcs . in the directories of the datastores (there are more files/directories than .chunks). And note that the results may be also not identical due to other circumstances, like block sizes of the particular filesystems.
 
Sorry, my bad, I was referring to `du -csh`, not 'df'.

New PBS:
Bash:
# du -csh .chunks/
1017G   .chunks/
1017G   total

Old PBS:
Bash:
# du -csh .chunks/
6.5T    .chunks/
6.5T    total

This is not a ±10 Gb difference. We're talking about a full-blown 5.5TB difference, which is practically 5x more.
 
Here's something prelimilary:

Old PBS:
Code:
Per-guest summary (sorted by size):
  /vm/120  :   1.5TiB
  /vm/102  : 376.8GiB
  /vm/101  : 273.9GiB
  /vm/104  : 232.9GiB
  /vm/119  : 149.7GiB
  /vm/114  : 134.1GiB
  /vm/117  : 121.0GiB
  /vm/115  :  86.7GiB
  /vm/107  :  36.5GiB
  /vm/201  :  27.6GiB
  /vm/108  :  14.8GiB
  /vm/113  :  13.1GiB
  /vm/803  :  12.6GiB
  /vm/810  :  11.8GiB
  /vm/109  :  10.4GiB
  /vm/116  :   9.7GiB
  /vm/112  :   9.0GiB
  /vm/110  :   8.2GiB
  /vm/812  :   8.1GiB
  /vm/106  :   7.8GiB
  /vm/100  :   6.9GiB
  /vm/103  :   4.0GiB
  /vm/105  :   3.6GiB
  /vm/124  :   2.8GiB
  /vm/118  :   2.6GiB
  /vm/111  :   2.6GiB
  /vm/121  :   2.6GiB
  /vm/125  :   2.5GiB

TOTAL: 3.5 TB, which is roughly 50% of the 6.5 TB reported by `du` or PBS UI.

New PBS:
Code:
Per-guest summary (sorted by size):
  /vm/102 : 332.2GiB
  /vm/104 : 289.4GiB
  /vm/119 : 104.6GiB
  /vm/115 :  80.6GiB
  /vm/107 :  59.8GiB
  /vm/201 :  26.9GiB
  /vm/108 :  20.0GiB
  /vm/109 :  12.9GiB
  /vm/110 :  10.9GiB
  /vm/103 :   7.4GiB
  /vm/100 :   5.7GiB
  /vm/124 :   3.6GiB
  /vm/121 :   3.4GiB
  /vm/118 :   3.1GiB
  /vm/105 :   2.0GiB

TOTAL: ~1.1 TB, which is aligned with the space reported by PBS and by the S3 provider.

Here are the differences:
  • /vm/120 : 1.5TiB
  • /vm/101 : 273.9GiB
  • /vm/114 : 134.1GiB
  • /vm/117 : 121.0GiB
  • /vm/113 : 13.1GiB
  • /vm/803 : 12.6GiB
  • /vm/810 : 11.8GiB
  • /vm/116 : 9.7GiB
  • /vm/112 : 9.0GiB
  • /vm/812 : 8.1GiB
  • /vm/106 : 7.8GiB
  • /vm/111 : 2.6GiB
  • /vm/125 : 2.5GiB
  • /vm/9001 : 616.5MiB
  • /vm/9002 : 369.1MiB
  • /ct/106 : 239.2MiB
TOTAL: ~2.2 TB

So yes, we could consider that 2.2 (missing VMs) + 1.1 (what's on both sides) = 3.5 (give or take). We can state that the reports are correct on both sides.

But where is the rest of the storage, between 3.5 TB to 6.5 TB? This is not a "small difference", we're talking here about doubling the stored amount. And thinking long-term, if this is some sort of bug or issue, it could lead to massive cost increases, practically doubling the price.

Any hints on what should I do next? Thank you!
 
Last edited:
  • Like
Reactions: Johannes S
Ideally you prune before GC :)
As far as I can tell it takes preference when both are scheduled at the same time but I'm not 100% sure.
 
Last edited:
  • Like
Reactions: Johannes S
After re-running prune + garbage collect, I'm now seeing this:

1774369911927.png

This was totally different yesterday when posting the OP.

I will wait for 48 hours and report back.
 
Yup, I'm aware of that safety delay. But I'm wondering ... why did it happen? Both garbage collect and pruning were executed daily. There was no data being received/sent to PMX, so it's basically doing nothing for the last 3 months.
 
Nothing in the task log on the old PBS? It should show something, e.g.

2026-03-23T19:00:53-05:00: Removed garbage: 92.989 GiB
2026-03-23T19:00:53-05:00: Removed chunks: 63599
2026-03-23T19:00:53-05:00: Pending removals: 90.049 GiB (in 63374 chunks)