Enable HA on all VM's

repa

Renowned Member
Nov 18, 2010
37
4
73
Hi,

we're migrating >100 VM's from Local Storage to a CEPH Cluster.

Is there a command to enable HA on all VM's once?

We have no more local storage.
 
Hi,

there isn't a built in command for mass adding all VMs but it's pretty easy to script, for VMs on current node for example:

qm list 2>/dev/null | awk '/\d+/ {print "vm:", $1, "\n"}' >> /etc/pve/ha/resources.cfg

For really all VMs and CTs from the whole cluster;
Bash:
GUESTS=$(pvesh get /cluster/resources --type vm --output-format yaml | awk '/vmid:/ {print $2}')
for id in $GUESTS; do ha-manager add "$id"; done


It seems that we may want to adapt ha-manager add to accept lists of VMIDs in the future, not often required but when it's really nice to have.
 
Hi, currently that's not yet possible - I remember some discussion of adding it but there wasn't to much pressure on it, not sure though.

IMO, it could make sense to have as convenience, especially for API users but it's easily workaroundable by just adding it after create with a second API call.

Note GUI users, can add a VM/CT to HA not only from the Cluster → HA panel, but also inside the VM/Guest panels under "More" (top right corner, to the right of start/shutdown/..)
 
Thanks Thomas,

we have a 6 node Ceph Cluster. All VM's has HA not enabled atm. What happens, if a host fails and the VM's has no HA enabled?

Can we start them manually on another host or are they blocked on the failed host?
 
What happens, if a host fails and the VM's has no HA enabled?

That VM stays where it was, no HA no automatic recovery.

Can we start them manually on another host or are they blocked on the failed host?

If the VM itself is on shared storage and doesn't use any local resources of that node you can recover it by moving the configuration file in /etc/pve/nodes/<deadnode>/qemu-server/<VMID>.conf to another node (same sub directory).

This should only be done after one ensured the state of the seemingly failed node, as if only a single network went down, or the only the cluster stack, it could be that the VM still runs - and if you "steal" (move) and start that VM on another node you may corrupt the underlying disk images.

For this reason we do not allow a simple move VM/CT, if it's hosting node is unresponsive.
The HA manager ensures that the other node isn't running any VM/CTs by self fencing and aquiring the failed nodes HA locks, only then it recovers a HA service.

If you're failed node if offline for sure you can move and then start the VM or CT just fine, though.
If it died during some locked operation on the VM you need to unlock it manually (qm unlock <vmid>) though.

Hope that makes things a bit more clear
 
Last edited:
Hi,

there isn't a built in command for mass adding all VMs but it's pretty easy to script, for VMs on current node for example:

qm list 2>/dev/null | awk '/\d+/ {print "vm:", $1, "\n"}' >> /etc/pve/ha/resources.cfg

The awk syntax doesn't appear to work properly in current Proxmox, it gives me just one particular vmid out of the entire list for some reason.
Adjusting it to match 0-9 appears to work properly
qm list 2>/dev/null | awk '/\[0-9]+/ {print "vm:", $1, "\n"}' >> /etc/pve/ha/resources.cfg
 
I wrote a script for that to run regulary on each host node...
Code:
root@proxmox07:~# cat /usr/local/bin/pve-ha-enable.sh
#!/bin/bash
# Running VMs
VMIDS_R=$(qm list | grep running | awk '{print $1}' | tail -n +1)
# Stopped VMs
VMIDS_S=$(qm list | grep stopped | awk '{print $1}' | tail -n +1)
HOSTNAME=$(hostname)
for VMID in ${VMIDS_R}
do
        if [[ $(qm config ${VMID} | grep "nvme-zfs-mirror" | wc -c) -ne 0 ]]
        then
# Do not migrate VMs on local storage
                GROUP="${HOSTNAME}-do_not_migrate"
        else
# Migrate VMs on shared storage
                GROUP=${HOSTNAME}
        fi
        CMD="ha-manager add vm:${VMID} --group ${GROUP}"
        echo "${CMD}"
        ${CMD}
done
for VMID in ${VMIDS_S}
do
        CMD="ha-manager add vm:${VMID} --group ${HOSTNAME} --state stopped"
        echo "${CMD}"
        ${CMD}
done

root@proxmox07:~#

You need to create the HA groups first though:
"<node>" name only containing the node
"<node>-do_not_migrate" containing only the node with restricted ticked on