Heavy use of passthrough GPU kills all VMS

pedromcaraujo

New Member
May 23, 2024
3
0
1
Hi everyone,

I've been having a problem with a specific VM which has a passthrough to a Tesla P40.

This VM has 2 things: ollama and the Tesla P40. Nothing else is running.

What happens is that after using Ollama for 2 or 3 prompts all my VMS and LXCs reboot. I have no idea why. You can see here when this happens.
Screenshot from 2024-06-29 21-07-38.png

Here is the config of the VM I mentioned:
Screenshot from 2024-06-29 21-08-24.png

Does anyone have a similar problem or have any idea why this might be happening?
 

Attachments

  • Screenshot from 2024-06-29 21-07-23.png
    Screenshot from 2024-06-29 21-07-23.png
    25.4 KB · Views: 3
If the driver or device inside the VM crashes, it can easily take down the Proxmox host because it's connected to the actual PCIe bus when using passthrough. I have no idea why it crashes inside the VM (no experience with NVidia), sorry. Check the logs inside the VM? Remember that a VM with passthrough requires all VM memory to be pinned into actual host RAM (so leave enough for the rest and Proxmox).
 
I have a Tesla K80 which begins to throttle at about 93°, at this point the drivers crash and take the vm out. You can watch the temerature using
Code:
watch nvidia-smi
 
  • Like
Reactions: leesteken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!