Enable HA on all VM's

repa · Mar 5, 2020

Hi,

we're migrating >100 VM's from Local Storage to a CEPH Cluster.

Is there a command to enable HA on all VM's once?

We have no more local storage.

t.lamprecht · Mar 5, 2020

Hi,

there isn't a built in command for mass adding all VMs but it's pretty easy to script, for VMs on current node for example:

qm list 2>/dev/null | awk '/\d+/ {print "vm:", $1, "\n"}' >> /etc/pve/ha/resources.cfg

For really all VMs and CTs from the whole cluster;

Bash:

GUESTS=$(pvesh get /cluster/resources --type vm --output-format yaml | awk '/vmid:/ {print $2}')
for id in $GUESTS; do ha-manager add "$id"; done

It seems that we may want to adapt ha-manager add to accept lists of VMIDs in the future, not often required but when it's really nice to have.

repa · Mar 6, 2020

thanks a lot, we'll try that!

repa · Mar 14, 2020

Hi Guys,

how can i define, that a newly created VM is HA enabled?

t.lamprecht · Mar 14, 2020

Hi, currently that's not yet possible - I remember some discussion of adding it but there wasn't to much pressure on it, not sure though.

IMO, it could make sense to have as convenience, especially for API users but it's easily workaroundable by just adding it after create with a second API call.

Note GUI users, can add a VM/CT to HA not only from the Cluster → HA panel, but also inside the VM/Guest panels under "More" (top right corner, to the right of start/shutdown/..)

repa · Mar 14, 2020

Thanks Thomas,

we have a 6 node Ceph Cluster. All VM's has HA not enabled atm. What happens, if a host fails and the VM's has no HA enabled?

Can we start them manually on another host or are they blocked on the failed host?

t.lamprecht · Mar 14, 2020

repa said:
What happens, if a host fails and the VM's has no HA enabled?

That VM stays where it was, no HA no automatic recovery.

repa said:
Can we start them manually on another host or are they blocked on the failed host?

If the VM itself is on shared storage and doesn't use any local resources of that node you can recover it by moving the configuration file in /etc/pve/nodes/<deadnode>/qemu-server/<VMID>.conf to another node (same sub directory).

This should only be done after one ensured the state of the seemingly failed node, as if only a single network went down, or the only the cluster stack, it could be that the VM still runs - and if you "steal" (move) and start that VM on another node you may corrupt the underlying disk images.

For this reason we do not allow a simple move VM/CT, if it's hosting node is unresponsive.
The HA manager ensures that the other node isn't running any VM/CTs by self fencing and aquiring the failed nodes HA locks, only then it recovers a HA service.

If you're failed node if offline for sure you can move and then start the VM or CT just fine, though.
If it died during some locked operation on the VM you need to unlock it manually (qm unlock <vmid>) though.

Hope that makes things a bit more clear

rekordskratch · Apr 5, 2022

t.lamprecht said:
It seems that we may want to adapt ha-manager add to accept lists of VMIDs in the future, not often required but when it's really nice to have.

If you've come across this thread from a google search like I did, please add your voice to the enhancement request over at:
https://bugzilla.proxmox.com/show_bug.cgi?id=1083

proxale · Jul 8, 2022

t.lamprecht said:
Hi,

there isn't a built in command for mass adding all VMs but it's pretty easy to script, for VMs on current node for example:

qm list 2>/dev/null | awk '/\d+/ {print "vm:", $1, "\n"}' >> /etc/pve/ha/resources.cfg

The awk syntax doesn't appear to work properly in current Proxmox, it gives me just one particular vmid out of the entire list for some reason.
Adjusting it to match 0-9 appears to work properly
qm list 2>/dev/null | awk '/\[0-9]+/ {print "vm:", $1, "\n"}' >> /etc/pve/ha/resources.cfg

Rainerle · Jul 10, 2022

I wrote a script for that to run regulary on each host node...

Code:

root@proxmox07:~# cat /usr/local/bin/pve-ha-enable.sh
#!/bin/bash
# Running VMs
VMIDS_R=$(qm list | grep running | awk '{print $1}' | tail -n +1)
# Stopped VMs
VMIDS_S=$(qm list | grep stopped | awk '{print $1}' | tail -n +1)
HOSTNAME=$(hostname)
for VMID in ${VMIDS_R}
do
        if [[ $(qm config ${VMID} | grep "nvme-zfs-mirror" | wc -c) -ne 0 ]]
        then
# Do not migrate VMs on local storage
                GROUP="${HOSTNAME}-do_not_migrate"
        else
# Migrate VMs on shared storage
                GROUP=${HOSTNAME}
        fi
        CMD="ha-manager add vm:${VMID} --group ${GROUP}"
        echo "${CMD}"
        ${CMD}
done
for VMID in ${VMIDS_S}
do
        CMD="ha-manager add vm:${VMID} --group ${HOSTNAME} --state stopped"
        echo "${CMD}"
        ${CMD}
done

root@proxmox07:~#

You need to create the HA groups first though:
"<node>" name only containing the node
"<node>-do_not_migrate" containing only the node with restricted ticked on

Search

Search

Enable HA on all VM's

repa

Renowned Member

t.lamprecht

Proxmox Staff Member

repa

Renowned Member

repa

Renowned Member

t.lamprecht

Proxmox Staff Member

repa

Renowned Member

t.lamprecht

Proxmox Staff Member

rekordskratch

Member

proxale

Active Member

Rainerle

Renowned Member