Heavy use of passthrough GPU kills all VMS

pedromcaraujo

New Member
May 23, 2024
3
0
1
Hi everyone,

I've been having a problem with a specific VM which has a passthrough to a Tesla P40.

This VM has 2 things: ollama and the Tesla P40. Nothing else is running.

What happens is that after using Ollama for 2 or 3 prompts all my VMS and LXCs reboot. I have no idea why. You can see here when this happens.
Screenshot from 2024-06-29 21-07-38.png

Here is the config of the VM I mentioned:
Screenshot from 2024-06-29 21-08-24.png

Does anyone have a similar problem or have any idea why this might be happening?
 

Attachments

  • Screenshot from 2024-06-29 21-07-23.png
    Screenshot from 2024-06-29 21-07-23.png
    25.4 KB · Views: 3
If the driver or device inside the VM crashes, it can easily take down the Proxmox host because it's connected to the actual PCIe bus when using passthrough. I have no idea why it crashes inside the VM (no experience with NVidia), sorry. Check the logs inside the VM? Remember that a VM with passthrough requires all VM memory to be pinned into actual host RAM (so leave enough for the rest and Proxmox).
 
I have a Tesla K80 which begins to throttle at about 93°, at this point the drivers crash and take the vm out. You can watch the temerature using
Code:
watch nvidia-smi
 
  • Like
Reactions: leesteken