Intel Arc A770 LLM inferencing in a power efficient way

slave2anubis

Member
Feb 29, 2020
8
1
23
35
Hello everyone,
I found myself a cheap Intel Arc A770 16GB on the used market and bought it thinking i would do some LLM tinkering, since the at least for me its kinda the best bang for the buck for this application.
I installed in my Proxmox 8.2.4 homeserver and did the usual GPU passthrough routine as many times in the past, all good and dandy. Well this would not be a forums post if it went smoothly as expected.

First problem: I added the GPU's HW ids to the VFIO driver module so that the i915 driver wont pick up the card at boot, witch went as expected, but there is a catch! If i restart the server it draws (with GPU installed) around 75-85W at idle after all VM/CT have booted, witch is normal and expected, basically the Intel Arc power draw is around 5W or less in idle.
When i start a VM where i assigned the Intel Arc GPU, the power consumption goes to 115-125W instantly (even the GPU fans start spinning), but get this, the VM has no DE its a basic Ubuntu 24.04 server with not workload what so ever. I also tried with a Windows 10 VM with the Intel Arc assigned still the same situation, a 30-40W power draw at idle. In Windows the Intel Arc driver control thinghy reports a 30-40W power usage. The wierd part is that there is nothing plugged in the HDMI/DP ports, just idle power draw.
I've did some research on the interwebs and fond some things that i could do to fix this behaviour, like enabeling ASPM or doing the pci reset, but until now nothing helped.
The crazy part is, that even if i turn off the VM that has the GPU assigned to it, the power draw still remains, no matter what i do, only doing a host reboot brings the card in a low power consumption state.

Second problem: Given the fact that first problem maybe cannot be solved without Intel putting out a patch or something, would it be possible to run the GPU directly on the host (with the i915 driver) and run the LLM stuff in a CT.
On the same machine i also use the iGPU (Intel 12'th gen CPU) to do HW encoding/decoding on a Jellyfin CT, i wonder if anyone here knows if i can do the same but for the LLM inferencing stuff?
In the Ubuntu VM described above i installed some proprietery Intel drivers (https://dgpu-docs.intel.com/driver/client/overview.html) from their PPA, but it turns out they only support Ubuntu as their base OS. Would the i915 driver present in (maybe with the non-free packages) Proxmox work for this application?

Any ideas are more then welcome!
Thanks.


PS. I know that for some people 30-40W of extra power draw is nothing, but when you run the server 24/7 it adds up, not to mention the current energy prices (at least here in Estern Europe).
 
AFAIK 30-40W of power usage for a card with 225W TDP while idle is nothing strange.

run the GPU directly on the host (with the i915 driver) and run the LLM stuff in a CT. <-- this shouldn't be too hard, though we always used the passthrough approach.

I would grab a Debian VM and shovel intel i915 up its a** and see if anything breaks. If not then proceed to install i915 on PVE.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!