Docker support in Proxmox

alexskysilk · Aug 20, 2024

verulian said:
I want a simple solution, not all of the "enchilada" that @alexskysilk notes.

install portainer.

LnxBil · Aug 20, 2024

Ramalama said:
I mean we still don't even have Numa Support, which makes Proxmox mostly useless on Dual-Socket or Newer Single-Socket AMD Plattforms like Genoa compared to other Hypervisors.

Can you elaborate? I thought QEMU and therefore PVE is able to use NUMA.

Ramalama · Aug 20, 2024

LnxBil said:
Can you elaborate? I thought QEMU and therefore PVE is able to use NUMA.

This is exactly what we want:

Code:

If you enable this feature, your system will try to arrange the resources such that a VM does have all its vCPUs on the same physical socket

Socket = Numa-Node

The issue is, that this doesnt work at all. Not even a bit.
And its absolutely critical on Single-Socket Milan/Genoa/Bergamo, because they use 4 CCD Chiplets on a single CPU.
You have to split those 4 Chiplets to 4 Numa-Nodes, because the communication between chiplets is slow as hell.

So you assume, according to the Wiki, that all vCPU's of one VM should run all on the same Chiplet (or numa-node).

But thats not the case, vCPU's are on the proxmox Host itself just threads.
But those threads are not only running randomly spreaded around all cores (ignoring numa completely), its even worse because each thread gets additionaly rotated every 1-2 Seconds to other physical cores.

I didnt tested on real dual-socket intel systems, but i can lay my hand into fire that there is the same Situation.
I dont have intel-systems where every single bit of Performance matters.
But i have 2 maxed out Genoa-Servers where each bit of Performance matters.

You can fix this yourself if you use cpu-pinning and do it yourself for each VM.
But with a lot of VM's and additionaly if you move them between hosts, its simply Impossible to use cpu-pinning.

The Performance impact on some tasks are between 40 and 300%. Even measurable with simple tools like iperf.

Cheers

LnxBil · Aug 21, 2024

Maybe we should split off to a new thread, this is much more interesting than than another homelabber chiming in to want a new Docker GUI.

Ramalama said:
So you assume, according to the Wiki, that all vCPU's of one VM should run all on the same Chiplet (or numa-node).

No, I assume that they run on the numa node from which the memory is used to reduce the inter-numa-node-communication.

Ramalama said:
The issue is, that this doesnt work at all. Not even a bit.

Where is your proof? You just provide anecdotal evidence, which is totally useless to interpret.

Trying to understand what you want to say and inspecting my dual-socket intel machines, my numastat shows, that I have this:

Code:

$ numastat
                           node0           node1
numa_hit            180134743078    129405155945
numa_miss             2028661746       461859704
numa_foreign           461859704      2028661746

Which shows that node0 has a miss ratio of 1,1% and node1 0,35% which are both far from "not even a bit". This is the worst numstat I found, others have even lower misses.

I don't have any AMD machine at my hand right now, so how does it look at your machine? Have you configured NUMA for EACH VM?

Ramalama · Aug 21, 2024

LnxBil said:
Maybe we should split off to a new thread, this is much more interesting than than another homelabber chiming in to want a new Docker GUI.

No, I assume that they run on the numa node from which the memory is used to reduce the inter-numa-node-communication.

Where is your proof? You just provide anecdotal evidence, which is totally useless to interpret.

Trying to understand what you want to say and inspecting my dual-socket intel machines, my numastat shows, that I have this:

Code:

$ numastat node0 node1 numa_hit 180134743078 129405155945 numa_miss 2028661746 461859704 numa_foreign 461859704 2028661746

Which shows that node0 has a miss ratio of 1,1% and node1 0,35% which are both far from "not even a bit". This is the worst numstat I found, others have even lower misses.

I don't have any AMD machine at my hand right now, so how does it look at your machine? Have you configured NUMA for EACH VM?

In the context of NUMA (Non-Uniform Memory Access) configurations, it's crucial to understand that the significance extends beyond just memory; the L3 cache plays a pivotal role. On Genoa platforms, L3 caches are distributed across NUMA nodes, with each cache supporting eight threads. This distribution is similar to how memory is handled, but with a key distinction:

Both L3 cache and memory are managed similarly by the Linux operating system. However, L3 cache operates significantly faster than memory DIMMs.

An important point to note is that misses in the L3 cache are not recorded by tools like numastat.

Data sharing between threads in a multithreaded application typically involves the L3 cache before accessing the memory:

If the data is relatively small and can be completely contained within the L3 cache, it remains there, allowing other threads on the same CPU immediate access.
However, if the data needs to be accessed by a thread on a different chiplet that does not share the same L3 cache, the data must traverse through the memory system.

This behavior underscores the performance implications in NUMA systems where data locality can significantly impact application performance.

And yes, this was so long ago, that my Previous Posts with Intel-Systems is wrong.
I just remembered that on Intel-Systems you aren't affected as much if all, due to the Monotholic Design of the CPU.
There is indeed a difference, i did testings in May on my Intel Dual Socket Systems, and the impact with CPU-Pinning was almost none, while on Genoa im getting around twice the Performance in some Applications.

TBH, for my brain this tests were so long ago, that i barely remember anything. But i can say for sure that this will get solved when more and more People/Companies will switch to Chiplet-Design CPU's.

BTW, i tested even Ryzen-CPU's (they have actually 2 chiplets) and for whatever Reason, they have no Performance benefit either with CPU-Pinning (like Intel Servers). The Ryzen-CPUs have no numa, but with CPU-Pinning you don't need Numa to test the impact.
Just the Genoa-Servers do have a huge impact. But they are not slow, still a lot faster as Intel, but they are exactly on par with Ryzen.
A VM with 4 Cores on Genoa 9374F (without pinning) vs Ryzen 5800X, is almost equal on the Performance side.
With Pinning, the Genoa is almost twice as fast.

https://forum.proxmox.com/threads/iperf3-speed-same-node-vs-2-nodes-found-a-bug.146805/
Check my latest posts of that Thread.
There are your proofs and whatever you want.

LnxBil · Aug 22, 2024

Ramalama said:
There are your proofs and whatever you want.

Thank you very much for the detailed explanation. I wasn't aware of the cache situation, which is completly feasible.

I just read up on the topic and I have no AMD system accessible, so have you time to check this out or have you already checked this out? It's fairly old, yet is AFAIK not automatically set on PVE. The actual commit has moved and is now available here.

LnxBil · Aug 22, 2024

I conducted a series of experiments with different numa layouts and benchmarked the memory and sadly you're right that QEMU is not able (in its current configuration) to allocate memory or cpu threads in their respective NUMA nodes, which leads to the problems you described. I really wonder why that is.

LnxBil · Aug 22, 2024

Ramalama said:
You can fix this yourself if you use cpu-pinning and do it yourself for each VM.
But with a lot of VM's and additionaly if you move them between hosts, its simply Impossible to use cpu-pinning.

You will not solve the memory NUMA allocation, yet the cache allocation. I just tested it with mbw benchmark and on the hypervisor, the QEMU process got memory from both (I have two) nodes. CPU pinning will give better performance, yet as you already stated, not that much on intel. The difference varies on the ratio of the memory distribution over the numa nodes, yet allocating all the wrong cpus will significantly worsen the problem also up to 2.5x slower, yet this is worse than the default with cycling around so it may just be a strong cornercase.

I noticed that on the Host, there are anonymous hugepages allocated for the vm.

LnxBil · Aug 22, 2024

LnxBil said:
You will not solve the memory NUMA allocation

That is already available in the configuration file, yet not via the GUI and not automatically. I played around with it in this thread. It seems to work and I am really interested in seeing if it would be a solution for you and if it will be faster (and easier to setup than just running taskset).

SInisterPisces · Aug 24, 2024

Thanks for the interesting discussion on the NUMA issue. I was completely unaware of that, and just learned quite a lot.

It would be great if this entire dicussion could be broken out of this thread and into its own. It's not really a Docker issue at all. I wonder if a mod could do that?

Getting this back on topic ...

Yes, I'm only a home server user, so I'm blissfully unaware of how Proxmox VE is used in commercial production infrastructure, but so far I've not seen a clear articulation of why Docker needs to be in Proxmox itself, rather than being managed through at least one VM or LXC container. Creating Docker Swarm or Kubernetes clusters in VM/LXC environments seems to be well-documented in the homelab space (e.g., it's all over my YouTube feeds, complicating otherwise simple projects

), which makes me think it must be even more standardized in commercial production environments, where it needs to be fast, reproducible, and reliable.

Docker networking, Docker storage, and even its nomenclature for managing its containers (start/stop vs. up/down) are all nothing like VM and LXC management, which mostly rely on the same storage and networking and general management (start, stop, restart, etc.) paradigm. So the existing shared LXC/VM UI principles couldn't just be adapted without serious redesign.

You'd need a completely separate UI for managing Docker's various components, and then if you do that, why aren't you implementing containerd more broadly? What about podman?

Well-develped, robust, and powerful GUI management tools for Docker/Containerd/Kubernetes/Podman already exist. LXCs now support Docker well, so that's an option if your hardware and use case needs it.

And, then there's the exponential growth in the amount of support requests Docker-in-the-GUI would generate from people trying to figure out how to make Docker work, who now go to Proxmox for support because, hey, Docker is in there.

Also: Proxmox would be responsible for monitoring Docker's release schedule and doing regression testing/etc. testing to push updates to the Docker that ships with Proxmox, and odds are that the Docker that's shipped with Proxmox would never be the latest one, and people would try to install the latest one anyway and break it and come here for help.

I'd much rather see Proxmox's dev team be allowed to focus on refining what's already there and implementing new features and continuing to surface existing features that only exist in config files into the GUI.

semanticbeeng · Feb 19, 2025

thesubmitter said:
Proxmox: KVM | LXC | Docker

It would completely dominate the market as it can manage all 3 type of platforms

When using Docker in Proxmox we can do distributed development.
For example, consider development container based development : https://containers.dev/ .

> A development container (or dev container for short) allows you to use a container as a full-featured development environment. It can be used to run an application, to separate tools, libraries, or runtimes needed for working with a codebase, and to aid in continuous integration and testing. Dev containers can be run locally or remotely, in a private or public cloud, in a variety of supporting tools and editors.

Kubernetes is not great as development environment.
Gitpod guys have given up on it:
https://x.com/bibryam/status/1853715272395301190
https://www.gitpod.io/blog/we-are-leaving-kubernetes

When doing dev container development for AI applications with something like VS Code we need Docker but Proxmox can also be great to deploy overall applications: vector store, LLM inference point. etc.

For what is was meant to be, Promox does not need Docker.
But there is opportunity for support of complex applications development and deployment.
And Docker is maninstream for that kind of work.

PS: But there is not much of debate since Docker can be deployed in Promox in many different ways.
For example it is possible to share the same docker engire (network, vms, etc) across VMs.
This is the best article I found in my intense research: https://edhull.co.uk/blog/2018-02-11/proxmox-network-docker.

Miraculix_de · Apr 13, 2025

When using docker in business, I faced the problem that rootless docker is optional and many docker images does not work correctly in rootless mode. I would never ever run docker as root on my PVE host. Another reason is that docker creates lots of virtual network connections which might disturb your host network. Plus, docker can create huge amounts of data in /var/lib/docker.

SInisterPisces · Apr 18, 2025

Miraculix_de said:
When using docker in business, I faced the problem that rootless docker is optional and many docker images does not work correctly in rootless mode. I would never ever run docker as root on my PVE host. Another reason is that docker creates lots of virtual network connections which might disturb your host network. Plus, docker can create huge amounts of data in /var/lib/docker.

This is my big technical reason for not wanting to run it on Proxmox, everything else aside. Proxmox is a complex system running with root access.

Dropping Docker alongside it, also with root access, seems like a recipe for unending difficult-to-diagnose-and-fix glitches and major issues.

(Aside: I'm going to try to migrate as much of my Docker usage as possible to rootless Podman, and for the containers that still need root access, see what other options I have.)

Search

Search

Docker support in Proxmox

alexskysilk

Distinguished Member

LnxBil

Distinguished Member

Ramalama

Renowned Member

LnxBil

Distinguished Member

Ramalama

Renowned Member

LnxBil

Distinguished Member

LnxBil

Distinguished Member

LnxBil

Distinguished Member

LnxBil

Distinguished Member

SInisterPisces

Well-Known Member

semanticbeeng

New Member

Miraculix_de

Member

SInisterPisces

Well-Known Member

We value your privacy