Intel Arc A770 LLM inferencing in a power efficient way

slave2anubis

Member
Feb 29, 2020
8
1
23
36
Hello everyone,
I found myself a cheap Intel Arc A770 16GB on the used market and bought it thinking i would do some LLM tinkering, since the at least for me its kinda the best bang for the buck for this application.
I installed in my Proxmox 8.2.4 homeserver and did the usual GPU passthrough routine as many times in the past, all good and dandy. Well this would not be a forums post if it went smoothly as expected.

First problem: I added the GPU's HW ids to the VFIO driver module so that the i915 driver wont pick up the card at boot, witch went as expected, but there is a catch! If i restart the server it draws (with GPU installed) around 75-85W at idle after all VM/CT have booted, witch is normal and expected, basically the Intel Arc power draw is around 5W or less in idle.
When i start a VM where i assigned the Intel Arc GPU, the power consumption goes to 115-125W instantly (even the GPU fans start spinning), but get this, the VM has no DE its a basic Ubuntu 24.04 server with not workload what so ever. I also tried with a Windows 10 VM with the Intel Arc assigned still the same situation, a 30-40W power draw at idle. In Windows the Intel Arc driver control thinghy reports a 30-40W power usage. The wierd part is that there is nothing plugged in the HDMI/DP ports, just idle power draw.
I've did some research on the interwebs and fond some things that i could do to fix this behaviour, like enabeling ASPM or doing the pci reset, but until now nothing helped.
The crazy part is, that even if i turn off the VM that has the GPU assigned to it, the power draw still remains, no matter what i do, only doing a host reboot brings the card in a low power consumption state.

Second problem: Given the fact that first problem maybe cannot be solved without Intel putting out a patch or something, would it be possible to run the GPU directly on the host (with the i915 driver) and run the LLM stuff in a CT.
On the same machine i also use the iGPU (Intel 12'th gen CPU) to do HW encoding/decoding on a Jellyfin CT, i wonder if anyone here knows if i can do the same but for the LLM inferencing stuff?
In the Ubuntu VM described above i installed some proprietery Intel drivers (https://dgpu-docs.intel.com/driver/client/overview.html) from their PPA, but it turns out they only support Ubuntu as their base OS. Would the i915 driver present in (maybe with the non-free packages) Proxmox work for this application?

Any ideas are more then welcome!
Thanks.


PS. I know that for some people 30-40W of extra power draw is nothing, but when you run the server 24/7 it adds up, not to mention the current energy prices (at least here in Estern Europe).
 
AFAIK 30-40W of power usage for a card with 225W TDP while idle is nothing strange.

run the GPU directly on the host (with the i915 driver) and run the LLM stuff in a CT. <-- this shouldn't be too hard, though we always used the passthrough approach.

I would grab a Debian VM and shovel intel i915 up its a** and see if anything breaks. If not then proceed to install i915 on PVE.
 
I'm super interested in how this goes as I'm contemplating a purchase of the this card for the same reason. Please keep it updated!