HA VM stuck on migration back to initial node.

May 23, 2012
19
0
41
Hi Forum.

Today as OVH network was struck by major incident, the cluster HA went crazy moving VMs around.
So far, given the severity of the situation, quite good... however, I got a remaining VM that is left on no-man's land on HA:

IT was moved to an alternate node, but, upon network restablishment, it is in an endless failed/abort migration loop.
I have tried to manually handle that on the HA/GUI to no avail... I don't know what to do:

The failed migrations are log as follows:

Code:
()
task started by HA resource agent
2021-10-13 14:24:44 use dedicated network address for sending migration traffic (10.100.10.51)
2021-10-13 14:24:44 starting migration of VM 30100 to node 'lnd202011a' (10.100.10.51)
2021-10-13 14:24:45 found local, replicated disk 'local-zfs:vm-30100-disk-0' (in current VM config)
2021-10-13 14:24:45 replicating disk images
2021-10-13 14:24:45 start replication job
2021-10-13 14:24:45 guest => VM 30100, running => 0
2021-10-13 14:24:45 volumes => local-zfs:vm-30100-disk-0
2021-10-13 14:24:46 create snapshot '__replicate_30100-0_1634127885__' on local-zfs:vm-30100-disk-0
2021-10-13 14:24:46 using insecure transmission, rate limit: 50 MByte/s
2021-10-13 14:24:46 full sync 'local-zfs:vm-30100-disk-0' (__replicate_30100-0_1634127885__)
2021-10-13 14:24:46 using a bandwidth limit of 50000000 bps for transferring 'local-zfs:vm-30100-disk-0'
volume 'rpool/vm-30100-disk-0' already exists
2021-10-13 14:24:47 file /etc/pve/storage.cfg line 12 (section 'local') - unable to parse value of 'prune-backups': invalid format - format error
2021-10-13 14:24:47 keep-all: property is not defined in schema and the schema does not allow additional properties
2021-10-13 14:24:47 full send of rpool/vm-30100-disk-0@__replicate_30100-0_1604404823__ estimated size is 20.1G
2021-10-13 14:24:47 send from @__replicate_30100-0_1604404823__ to rpool/vm-30100-disk-0@__replicate_30100-0_1634127885__ estimated size is 106M
2021-10-13 14:24:47 total estimated size is 20.2G
2021-10-13 14:24:47 TIME        SENT   SNAPSHOT rpool/vm-30100-disk-0@__replicate_30100-0_1604404823__
2021-10-13 14:24:47 command 'zfs send -Rpv -- rpool/vm-30100-disk-0@__replicate_30100-0_1634127885__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2021-10-13 14:24:47 delete previous replication snapshot '__replicate_30100-0_1634127885__' on local-zfs:vm-30100-disk-0
2021-10-13 14:24:47 end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-30100-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_30100-0_1634127885__ | /usr/bin/cstream -t 50000000' failed: exit code 141
2021-10-13 14:24:47 ERROR: Failed to sync data - command 'set -o pipefail && pvesm export local-zfs:vm-30100-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_30100-0_1634127885__ | /usr/bin/cstream -t 50000000' failed: exit code 141
2021-10-13 14:24:47 aborting phase 1 - cleanup resources
2021-10-13 14:24:47 ERROR: migration aborted (duration 00:00:03): Failed to sync data - command 'set -o pipefail && pvesm export local-zfs:vm-30100-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_30100-0_1634127885__ | /usr/bin/cstream -t 50000000' failed: exit code 141
TASK ERROR: migration aborted

The VMs are using local (ZFS) storage, with syncing.... so far, it has worked so good ... however, I don't know the right way to recover from this situation and what causes it.

Best regards.
 
Bypassed the issue by removing HA config and the replication task. Then started the VM, and finally manually performing a migration.

Now I could re-create the replication and HA config I removed.... but I'm somehow unconfident about it...if it failed once, it may fail twice in the same conditions.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!