Unable to migrate HA VMs when host interface down

steve_york

New Member
Apr 19, 2022
3
0
1
Hi,
I have a 4-node cluster with HA-enabled VMs on each cluster host. Each host has a primary and secondary physical network interface as you'd expect, with the cluster on the second interface.

One of the hosts has experienced a physical issue with its primary network interface - it is no longer reachable (shows "no carrier") - however all of the VMs running on this host are still functioning fine and the cluster is not showing any issues.

I attempted to migrate the VMs from this host using;
ha-manager migrate vm:xxx targethost

The job appeared as expected but terminates shortly after with "Error: migration aborted"; I can't find anything relevant in logs (including /etc/pve/.clusterlog).

Is there any way to migrate these over to another host or should I just bring the second interface down and let HA do its job?

Many thanks,
Steve
 
You can try to migrate via the CLI interface and use the paramter --migration_network <cidr> that should be able to migrate the VMs via the cluster interconnect. For more information about it, please read the manpage of qm.
 
Thanks for your reply. My initial try was using the CLI and ha-manager.

Based on your recommendation, I found that if you have a shell where the VM is currently running and the host and target host have connected primary interfaces, it works fine (it doesn't work if you try if from a cluster host that isn't hosting that VM).

I tested this with a sacrificial VM and it worked with cluster hosts where everything is fine (no network interface issues) - I used (redacted host/IP etc);

# qm migrate 101 TGT --migration_network 10.0.1.0/24 --online

When I try this from a shell (with IPMI access) on the host that has a failed primary interface (but is still serving/running the VM), the migration fails - although a more useful error is logged (redacted host / IP etc);

task started by HA resource agent
2022-04-19 21:54:25 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=SVR' root@1.2.3.4 /bin/true
2022-04-19 21:54:25 ssh: connect to host 1.2.3.4 port 22: Connection timed out
2022-04-19 21:54:25 ERROR: migration aborted (duration 00:02:11): Can't connect to destination address using public key
TASK ERROR: migration aborted

I tried the same on another host but it refuses to find the VM unless the VM is present on the same host. Looking at the error, it makes me think it's not respecting the migration_network or is attempting to do something else; am I missing something?

Many thanks again,
Steve
 
I tried the same on another host but it refuses to find the VM unless the VM is present on the same host.
That's normal, the qm and pct programs are local host affine.

task started by HA resource agent
2022-04-19 21:54:25 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=SVR' root@1.2.3.4 /bin/true
2022-04-19 21:54:25 ssh: connect to host 1.2.3.4 port 22: Connection timed out
2022-04-19 21:54:25 ERROR: migration aborted (duration 00:02:11): Can't connect to destination address using public key
TASK ERROR: migration aborted
Have you restricted your SSH to your main IP? Please check that (e.g. on the target with netstat -tnlp | grep 22 )
 
Thanks again!

SSH was not restricted to the primary interface/IP - was listening on all interfaces;

# ss -plant | egrep :22
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users: (("sshd",pid=1351,fd=3))
LISTEN 0 128 [::]:22 [::]:* users: (("sshd",pid=1351,fd=4))

Sadly I can't continue working on this; a "helpful" DC tech noticed a switch port fault, killed the host's power, replaced a cable, and brought the host back up - by which time the VMs had moved to other cluster hosts. Fun times and a free DR test.

We're going to try and replicate this scenario in our lab using the same config; in the meantime thanks for your responses. If we find anything useful I'll update this thread.

Thanks,
Steve
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!