Hi all,
I need some guidance on restoring VM data after an HA/replication related issue in a 2-node Proxmox cluster.
Environment
What happened
We experienced an issue where one node went down unexpectedly. Before the failure:
During/after the incident:
Status:
What I need:
I want to understand the safest way to restore the VM using the replicated data without corrupting anything.
Specifically:
Goal
Bring the VM up using the most recent consistent disk state from replication, and then re-establish replication cleanly afterward.
I want to avoid:
Any step-by-step guidance for this recovery scenario would be greatly appreciated.
Thanks!
I need some guidance on restoring VM data after an HA/replication related issue in a 2-node Proxmox cluster.
Environment
- Proxmox VE cluster: 2 nodes
- HA enabled
- Storage used for VMs: (ZFS / LVM-thin )
- Replication configured between nodes
- Affected VM: VMID 2210
What happened
We experienced an issue where one node went down unexpectedly. Before the failure:
- HA was configured for the VM
- Replication jobs were active between nodes
- The VM was running on Node A
- Replicated copy existed on Node B
During/after the incident:
- The node hosting the active VM failed
- HA attempted to recover the VM
- Now I’m in a situation where:
- The VM disk on one side seems out of sync / inconsistent / missing recent data
- I suspect the replicated volume may have a more recent or at least usable state
Status:
- HA service for the VM is currently disabled to avoid further changes.
- I have not restarted replication jobs yet.
What I need:
I want to understand the safest way to restore the VM using the replicated data without corrupting anything.
Specifically:
- How can I verify which replica snapshot is the latest consistent one?
- Is it safe to:
- Detach the current disk
- Promote the replicated volume on the secondary node
- Attach it back to the VM config?
- With Proxmox replication, is there a recommended way to:
- Break the replication relationship
- Make the replica the new primary disk
- Are there logs that clearly show the last successful replication snapshot?
- /var/log/pve/tasks/
- journalctl
- other?
Goal
Bring the VM up using the most recent consistent disk state from replication, and then re-establish replication cleanly afterward.
I want to avoid:
- Starting the VM on a partially synced disk
- Accidentally overwriting the good replica with an older state
Any step-by-step guidance for this recovery scenario would be greatly appreciated.
Thanks!