A guard rail suggestion to avoid losing everything like I just did.

anthony1956

New Member
Jan 13, 2026
1
2
3
Dear Proxmox Support / Development Team,

I am writing with a design suggestion arising from a recent real-world incident, offered in the spirit of constructive feedback rather than criticism.

During a restore operation from snapshot roll back, the system correctly warned that the action would overwrite the current state and that there would be no way back. Under fatigue and operational pressure (at 4am working all night), I proceeded — and only later realised that I had restored from a much older backup than intended, with no snapshot having been taken beforehand. I lost everyone's email from the server with no backup as this was the backup; which system was faulty (my fault).

What struck me afterwards is that the warning was *informational* but not *protective*. At the exact moment when human judgement is most fallible (stress, tiredness, urgency), the system relies entirely on the operator having already taken the correct precaution.

I would like to suggest a very minimal and human-factors-oriented enhancement:

**A short-lived “undo horizon” snapshot, created automatically immediately before any destructive restore or overwrite operation.** or indeed just roll backs - maybe when there are no recent snapshots.

Key characteristics of this idea:

- Snapshot creation is **automatic**, not dependent on operator action
- The snapshot exists **in spite of the operator**, not because of them
- It is not necessarily prominent in the usual snapshot list (implementation detail)
- **No automatic deletion** — later deletion of it should be an explicit, conscious user action taken later, when the situation has stabilised
- Disk usage can be surfaced calmly after the fact (“Undo snapshot exists, consumes X GB”)
- A global setting could allow this feature to be disabled for environments with exceptional privacy or sensitivity requirements, but enabled by default

The intent is not to add warnings, buttons, or friction, but to ensure that at the moment of maximum risk (I was "out of it" with stress and fatigue at 4am) , the system quietly guarantees reversibility. Decision-making about retention is deferred to a time when the operator is no longer under pressure.

In hindsight, such a mechanism would almost certainly have prevented a serious but avoidable no recovery incident. More broadly, it seems aligned with the philosophy of designing systems that assume capable, but tired humans.

I hope this suggestion is useful. Thank you for the work you do — Proxmox has enabled me to run robust infrastructure for many months, and this idea comes from that same place of respect.

Kind regards,

Anthony - approaching 70 years of age; tech involved for 50 years.
 
Last edited:
It would be easy to say "just do the right thing", but having been there at 4am... yeah. I agree that something like you just described would be cool.

Just a hint for the future: you can restore a backup to a NEW VM instead of to the original VM. Check it out if its ok and ONLY AFTER SUCCESSFULL validation delete the old one.

You'd need enough space in your datastore for this to be possible though...
 
  • Like
Reactions: Johannes S
If possible (e.g. enough storage available) you should always restore to a new VMID, or create a backup of the broken system before overwriting it.

You can never be sure that a backup is valid and restorable until you have sucessfully restored AND tested it.

Unfortunatly, especially for large systems and backups, this is not always possible because of storage space or recovery time constraints.

Having had non restorable Backup (ntbackup and others just said something like "This Backup is not valid"), i sincerly sympathise
 
Feature requests or bug reports should be filed on bugzilla.proxmox.com since there is no guarantee the developers read everything im this community forum.
 
  • Like
Reactions: UdoB
Hi @anthony1956 , welcome to the forum.

While I sympathize with your situation and your request, we have all been there, what you are proposing is essentially a "knee-jerk" reaction and has very little chance of being implemented. It may make sense in the context of your recent experience, but there are several reasons why it should not become a platform feature:

- It would change long-standing behavior that users have relied on for years.
- It would break or complicate existing automation in production environments.
- It would create obstacles for many supported storage backends that cannot delete snapshots out of order (not ours).
- It would impose a new operational burden on administrators, who would need to track such special snapshots manually.
- It will likely be disabled by most operators, while its implementation will siphon valuable development resources.

It is also difficult to prioritize your scenario, rolling back from the “wrong” snapshot, from other forms of operator error, such as:

- deleting the wrong snapshot during testing
- deleting the wrong VM
- deleting the wrong disk
- rebooting an incorrect node or VM
- running a destructive script against the wrong environment (I am guilty of that one)

As noted, I understand the frustration - this is a painful mistake many administrators have made. But it is not realistic to introduce permanent technical safeguards to prevent errors by a user who already has full administrative access to the system. Operational discipline, peer review, and change control remain the best defenses.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox