Hello everyone,
I found myself a cheap Intel Arc A770 16GB on the used market and bought it thinking i would do some LLM tinkering, since the at least for me its kinda the best bang for the buck for this application.
I installed in my Proxmox 8.2.4 homeserver and did the usual GPU passthrough routine as many times in the past, all good and dandy. Well this would not be a forums post if it went smoothly as expected.
First problem: I added the GPU's HW ids to the VFIO driver module so that the i915 driver wont pick up the card at boot, witch went as expected, but there is a catch! If i restart the server it draws (with GPU installed) around 75-85W at idle after all VM/CT have booted, witch is normal and expected, basically the Intel Arc power draw is around 5W or less in idle.
When i start a VM where i assigned the Intel Arc GPU, the power consumption goes to 115-125W instantly (even the GPU fans start spinning), but get this, the VM has no DE its a basic Ubuntu 24.04 server with not workload what so ever. I also tried with a Windows 10 VM with the Intel Arc assigned still the same situation, a 30-40W power draw at idle. In Windows the Intel Arc driver control thinghy reports a 30-40W power usage. The wierd part is that there is nothing plugged in the HDMI/DP ports, just idle power draw.
I've did some research on the interwebs and fond some things that i could do to fix this behaviour, like enabeling ASPM or doing the pci reset, but until now nothing helped.
The crazy part is, that even if i turn off the VM that has the GPU assigned to it, the power draw still remains, no matter what i do, only doing a host reboot brings the card in a low power consumption state.
Second problem: Given the fact that first problem maybe cannot be solved without Intel putting out a patch or something, would it be possible to run the GPU directly on the host (with the i915 driver) and run the LLM stuff in a CT.
On the same machine i also use the iGPU (Intel 12'th gen CPU) to do HW encoding/decoding on a Jellyfin CT, i wonder if anyone here knows if i can do the same but for the LLM inferencing stuff?
In the Ubuntu VM described above i installed some proprietery Intel drivers (https://dgpu-docs.intel.com/driver/client/overview.html) from their PPA, but it turns out they only support Ubuntu as their base OS. Would the i915 driver present in (maybe with the non-free packages) Proxmox work for this application?
Any ideas are more then welcome!
Thanks.
PS. I know that for some people 30-40W of extra power draw is nothing, but when you run the server 24/7 it adds up, not to mention the current energy prices (at least here in Estern Europe).
I found myself a cheap Intel Arc A770 16GB on the used market and bought it thinking i would do some LLM tinkering, since the at least for me its kinda the best bang for the buck for this application.
I installed in my Proxmox 8.2.4 homeserver and did the usual GPU passthrough routine as many times in the past, all good and dandy. Well this would not be a forums post if it went smoothly as expected.
First problem: I added the GPU's HW ids to the VFIO driver module so that the i915 driver wont pick up the card at boot, witch went as expected, but there is a catch! If i restart the server it draws (with GPU installed) around 75-85W at idle after all VM/CT have booted, witch is normal and expected, basically the Intel Arc power draw is around 5W or less in idle.
When i start a VM where i assigned the Intel Arc GPU, the power consumption goes to 115-125W instantly (even the GPU fans start spinning), but get this, the VM has no DE its a basic Ubuntu 24.04 server with not workload what so ever. I also tried with a Windows 10 VM with the Intel Arc assigned still the same situation, a 30-40W power draw at idle. In Windows the Intel Arc driver control thinghy reports a 30-40W power usage. The wierd part is that there is nothing plugged in the HDMI/DP ports, just idle power draw.
I've did some research on the interwebs and fond some things that i could do to fix this behaviour, like enabeling ASPM or doing the pci reset, but until now nothing helped.
The crazy part is, that even if i turn off the VM that has the GPU assigned to it, the power draw still remains, no matter what i do, only doing a host reboot brings the card in a low power consumption state.
Second problem: Given the fact that first problem maybe cannot be solved without Intel putting out a patch or something, would it be possible to run the GPU directly on the host (with the i915 driver) and run the LLM stuff in a CT.
On the same machine i also use the iGPU (Intel 12'th gen CPU) to do HW encoding/decoding on a Jellyfin CT, i wonder if anyone here knows if i can do the same but for the LLM inferencing stuff?
In the Ubuntu VM described above i installed some proprietery Intel drivers (https://dgpu-docs.intel.com/driver/client/overview.html) from their PPA, but it turns out they only support Ubuntu as their base OS. Would the i915 driver present in (maybe with the non-free packages) Proxmox work for this application?
Any ideas are more then welcome!
Thanks.
PS. I know that for some people 30-40W of extra power draw is nothing, but when you run the server 24/7 it adds up, not to mention the current energy prices (at least here in Estern Europe).