Restore VM data after HA / replication sync issue – node failure scenario

RLMPve · Jan 30, 2026

Hi all,

I need some guidance on restoring VM data after an HA/replication related issue in a 2-node Proxmox cluster.

Environment

Proxmox VE cluster: 2 nodes
HA enabled
Storage used for VMs: (ZFS / LVM-thin )
Replication configured between nodes
Affected VM: VMID 2210

What happened

We experienced an issue where one node went down unexpectedly. Before the failure:

HA was configured for the VM
Replication jobs were active between nodes
The VM was running on Node A
Replicated copy existed on Node B

During/after the incident:

The node hosting the active VM failed
HA attempted to recover the VM
Now I’m in a situation where:
- The VM disk on one side seems out of sync / inconsistent / missing recent data
- I suspect the replicated volume may have a more recent or at least usable state

Status:

HA service for the VM is currently disabled to avoid further changes.
I have not restarted replication jobs yet.

What I need:

I want to understand the safest way to restore the VM using the replicated data without corrupting anything.

Specifically:

How can I verify which replica snapshot is the latest consistent one?
Is it safe to:
- Detach the current disk
- Promote the replicated volume on the secondary node
- Attach it back to the VM config?
With Proxmox replication, is there a recommended way to:
- Break the replication relationship
- Make the replica the new primary disk
Are there logs that clearly show the last successful replication snapshot?
- /var/log/pve/tasks/
- journalctl
- other?

Goal

Bring the VM up using the most recent consistent disk state from replication, and then re-establish replication cleanly afterward.

I want to avoid:

Starting the VM on a partially synced disk
Accidentally overwriting the good replica with an older state

Any step-by-step guidance for this recovery scenario would be greatly appreciated.

Thanks!

mfederanko · Feb 4, 2026

In a two-node cluster you should not have quorum and thus no fail-over from one host to the other will occur. HA might have tried to start the replica, but will not have succeeded in doing so due to this - except if you configured external vote support [0].

Another option is to lower the expected quorum votes to 1 [1] this can lead lead to data corruption if done improperly, especially on shared storage. You should make sure that the failed node is powered off and can not come back up on the network. It should be a last resort step to bring essential VMs up or restore quorum.

So in theory, host A should have the most recent version, but might contain corrupted data due to the host failure, whereas host B will have the last replica, which most likely will be consistent.

If you got a replicate on B to start then the replication job will have switched automatically from B -> A. The VM running on B will stay there barring any affinity rules [2].

If your VM state on node A is inconsistent/bad you can force a migration to node B, this worked on my test setup:

* Make sure node A is powered off, pull the network cable to be sure.
* Lower the expected quorum votes on node B to 1: pvecm expected 1
* Wait until the replica got started on node B, the replication job on node B will have switched to node A and probably fail since A is still down
* Start node A and check the cluster status: pvecm status, expected votes should be back at 2
* You might have to configure replication again, though with my test setup it worked without needing manual intervention

[0]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
[1]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_write_configuration_when_not_quorate
[2]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#ha_manager_node_affinity_rules

Search

Search

Restore VM data after HA / replication sync issue – node failure scenario

RLMPve

New Member

mfederanko

Member

We value your privacy