Migrate VMs on kernel panic?

BobMccapherey

Member
Apr 25, 2020
33
0
6
43
I am currently experiencing random kernel panics on Intel GVT vGPU VMs at random times. I've noticed that even though I get these kernel panics, the non-vGPU VMs that are running on the same host seem to be unaffected. I've set the following settings in my /etc/sysctl.conf

kernel.panic = 120
kernel.hung_task_panic = 1

Is there a way to force fencing such that the non-vGPU VMs get migrated before an automatic reboot of the host from a panic?
 
Hi,

Is there a way to force fencing such that the non-vGPU VMs get migrated before an automatic reboot of the host from a panic?

not really, at least not straightforward. You could use a script which continuously reads the kernel messages (dmesg or /var/log/messages) and when detecting a hung task message it does a pvenode migrateall

Something basic like the following could work.

Bash:
#!/bin/bash

tail -Fno /var/log/messages|grep -iP --line-buffered 'task .+ blocked for more than \d+ seconds' | \
while read line ; do
    echo "detected hung task at $(date), migrating"
    pvenode migrateall NODE # or something more elaborate
    exit
done

You'd need to test and adapt this for yourself.