Prune without defined retention: Concerning logs but are backups safe?

Feb 7, 2025
4
0
1
Hello,

I clicked "Run now" under Prune Jobs without having a retention policy defined. It correctly identified that I had "no prune selection", but then proceeded to tell me it was deleting every snapshot on the server. It ran faster than I think it would've if it was actually pruning anything (0.1s), but it's still alarming!

I have verified that the index files still exist (and took a backup), and it seems the snapshot counts in datastore content view are unaffected, but I'm concerned that something undesirable might happen next GC run.

Hoping this is just a bug with the reporting. Any info would be appreciated. Thank you!

Code:
2025-02-07T12:11:10-07:00: prune job 'default-myorg-xyz'
2025-02-07T12:11:10-07:00: Starting datastore prune on datastore 'myorg', root namespace, down to full depth
2025-02-07T12:11:10-07:00: No prune selection - keeping all files.
2025-02-07T12:11:10-07:00: Pruning group :"vm/107"
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-07T03:52:34Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-07T07:43:55Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-08T07:58:49Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-09T07:58:23Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-10T07:59:09Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-11T07:59:01Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-12T07:56:35Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-13T07:57:39Z
2025-02-07T12:11:10-07:00: remove vm/107/2024-08-14T07:59:06Z...
 
Looking into the code a bit, I'm wondering if this is what we need:

Diff:
diff --git a/src/server/prune_job.rs b/src/server/prune_job.rs
index 2de34973..e8d05130 100644
--- a/src/server/prune_job.rs
+++ b/src/server/prune_job.rs
@@ -41,6 +41,7 @@ pub fn prune_datastore(

     if keep_all {
         task_log!(worker, "No prune selection - keeping all files.");
+        return Ok(());
     } else {
         let rendered_options = cli_prune_options_string(&prune_options);
         task_log!(worker, "retention options: {rendered_options}");
 
I've confirmed that running GC after this prune job didn't actually delete the snapshots, despite the misleading log messages. However, this bug should be fixed because the current logic is fragile: keep_all is set based on prune_options.keeps_something() and later used in the deletion logic (!keep_all && !mark.keep()). If someone modifies how keep_all is determined or used, we could end up with unintended data deletion since the prune loop continues to execute even when keep_all is true.

The fix appears straightforward (adding a return after the "keeping all files" message), as shown in my previous diff.
 
Last edited:
  • Like
Reactions: Johannes S