Automatic Migration of a failed VM

proxmox_larry

Member
Nov 7, 2019
32
0
6
44
Hey guys,
I'm currently using a HA cluster with Ceph as shared storage.
Is it possible to initiate a automatic Migration of a VM if it loses the connection to the NIC on one node???
Thanks!
 
At the moment this is not possible easily. Whats the scenario when this happens, but the host were the VM is currently located still has network? As else, if the host also looses network, it will be already fenced and recovered.
 
The problem is that when you unplug the ethernet cable dedicated to the ethernet port which the VM is using, nothing happens!
The VM will lose the ethernet connection, but the host is still running in the cluster.

As you said there is still a chance to fix this?? How?
I think that Qemu should recognize the missing ethernet connection and migrate the VM to a node that has a active link...
 
The problem is that when you unplug the ethernet cable dedicated to the ethernet port which the VM is using, nothing happens!
The VM will lose the ethernet connection, but the host is still running in the cluster.

As you said there is still a chance to fix this?? How?
I think that Qemu should recognize the missing ethernet connection and migrate the VM to a node that has a active link...

That's problematic grey zone. What if link is up, but vlan is down? What if vlan is up, but some fw will block all connections?

Ask Qemu about this. This is not Proxmox problem.

Or use your monitoring and Proxmox api for migration such VM.
 
That's same as when hdd fail. That's why disk raids, network lacps etc exists. Redundancy.
 
I have to agree with @czechsys - this is what monitoring and having our REST API is for.. there are way too many failure conditions, some of which are only detectable outside of the PVE node. setup your monitoring to detect that the service in the VM has failed, and if the node it is running on is not fenced, attempt a migration/reboot/hard-reset/.. of the guest (or load-balancer fail-over to another instance, or ...). if the node has been fenced already, our HA stack should start the VM on another node anyway, your monitoring will see the service coming up again (hopefully - depending on what's caused the failure ;))
 
Thank you for the detailled answer!
Since I'm not a pro in using the API, what would be the best way a to get familiar with it and use it for migration in case of NIC failure?
 
https://pve.proxmox.com/pve-docs/api-viewer/index.html too get an overview over how the API is structured, which parameters/return values there are, etc.pp. (also shipped with your copy of PVE, see the "Documentation" button in the GUI)
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_user_management might also be interesting to create a user with limited permissions
https://pve.proxmox.com/wiki/Proxmox_VE_API for developer-oriented information regarding the API, and various clients/bindings.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!