I am having an issue where I have 3 hypervisors in a cluster, one is substantially different than the others (totally different CPU generation). Eventually I will replace them all, but that takes time. Until then, I have the CPU type set to the lowest compatible version, and for the most part, it works. What I do find though is, say I do maintenance and migrate a vm, when I migrate it back, some time in the next 48 hours the VM pegs the CPU at ~15-25% and then hangs.

I can detect this with "qm guest cmd ### info" and issue a "qm reset ###" but I want this automated. I find if I make the following a cron job every 5 minutes, I can essentially find and fix it automatically:
In theory this works, and if I were doing maintenance where I new a guest would be offline, it'd be easy enough to disable the cron job. My question is, is there a better way to do this? Is there some big obvious reason I'm not thinking of that this is a stupid idea?

I can detect this with "qm guest cmd ### info" and issue a "qm reset ###" but I want this automated. I find if I make the following a cron job every 5 minutes, I can essentially find and fix it automatically:
*/5 * * * * for vm in $(/usr/sbin/qm list | awk '{print $1}' | grep -Eo '[0-9]{1,3}'); do if [ $(/usr/sbin/qm guest cmd $vm info 2>&1 | grep -e "not running" | wc -l) -eq 1 ]; then /usr/sbin/qm reset $vm; fi; done
In theory this works, and if I were doing maintenance where I new a guest would be offline, it'd be easy enough to disable the cron job. My question is, is there a better way to do this? Is there some big obvious reason I'm not thinking of that this is a stupid idea?
Last edited: