Is it possible to emulate a GPU for LLMs?

Red Squirrel

Renowned Member
May 31, 2014
43
9
73
Is there a way, perhaps with a plugin or something, to emulate a GPU, and for vram, assign it say, a SSD or something? Obviously it would be very slow, but it would perhaps be a way to run even very big LLMs locally without the expense of a real GPU.

Is there actually a way to do this?
 
just an idea

https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-v3-0324-locally

Code:
Obtain the latest llama.cpp on GitHub here. You can follow the build instructions below as well. 
Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inference

and this part
Code:
A GPU will not necessary. You could run the model without a GPU but try not to unless you're using Apple's unified memory. 
Try to have at least 180GB of combined VRAM + RAM to get ~2 tokens/s otherwise the model will be too slow to run.

Although the are the minimum requirement is a CPU with 60GB RAM, performance will be very slow. 
Expect less than 1.5 tokens per second on minimal hardware - but that doesn't mean you can't experiment! Using a GPU will make your inference faster.

from here
https://unsloth.ai/blog/deepseek-v3-0324