Migrate VMs on kernel panic?

BobMccapherey

Member
Apr 25, 2020
33
0
6
43
I am currently experiencing random kernel panics on Intel GVT vGPU VMs at random times. I've noticed that even though I get these kernel panics, the non-vGPU VMs that are running on the same host seem to be unaffected. I've set the following settings in my /etc/sysctl.conf

kernel.panic = 120
kernel.hung_task_panic = 1

Is there a way to force fencing such that the non-vGPU VMs get migrated before an automatic reboot of the host from a panic?
 
Hi,

Is there a way to force fencing such that the non-vGPU VMs get migrated before an automatic reboot of the host from a panic?

not really, at least not straightforward. You could use a script which continuously reads the kernel messages (dmesg or /var/log/messages) and when detecting a hung task message it does a pvenode migrateall

Something basic like the following could work.

Bash:
#!/bin/bash

tail -Fno /var/log/messages|grep -iP --line-buffered 'task .+ blocked for more than \d+ seconds' | \
while read line ; do
    echo "detected hung task at $(date), migrating"
    pvenode migrateall NODE # or something more elaborate
    exit
done

You'd need to test and adapt this for yourself.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!