Very slow zfs performance in LXC

Ulrar

Active Member
Feb 4, 2016
37
1
28
33
HI,

I have a proxmox host with a KVM machine running docker, the disk being on ZFS (raw) and it works great.
I've been trying to run frigate and unfortunately it seems like passing a coral device through to a VM doesn't work very well, so I've created an LXC container on the same storage (but it seems to use a subvolume instead), privileged with a mount on the usb device I need, and installed docker in there. It also has nested cap enabled.

It works mostly fine except the disk performance is terrible, even a simple docker pulls freezes a lot and takes forever to complete. Interestingly it starts off great, but starts freezing after ~10 or ~15 seconds, wondering if there's some caching going on at the start maybe.
I can see the dockerd process is stuck on I/O in htop (D status).

Here's an fio on the LXC (subvolume) :

Code:
Run status group 0 (all jobs):
   READ: bw=877KiB/s (898kB/s), 209KiB/s-238KiB/s (214kB/s-244kB/s), io=56.8MiB (59.5MB), run=66251-66252msec
  WRITE: bw=607KiB/s (621kB/s), 143KiB/s-162KiB/s (147kB/s-166kB/s), io=39.2MiB (41.2MB), run=66251-66252msec


Now here's the exact same fio test within the VM, which is on the same storage (raw) :
Code:
   READ: bw=24.8MiB/s (26.0MB/s), 6339KiB/s-6354KiB/s (6492kB/s-6507kB/s), io=2455MiB (2574MB), run=98978-98979msec
  WRITE: bw=16.6MiB/s (17.4MB/s), 4240KiB/s-4255KiB/s (4341kB/s-4357kB/s), io=1641MiB (1721MB), run=98978-98979msec

As you can see the difference is huge, even though it's the same NVMe disk in both cases.
I've tried setting sync=none on the subvolume, and I've also tried setting checksum=none on it but in both cases no change.

Any idea of what else could be causing this bottleneck ?
The server isn't doing anything else, there's free RAM both in the LXC and outside it and the cpu rarely goes above 2%.
The KVM has `cache=none` set, in case that matters.

Thanks
 
Hi,

Sorry to bump this but this is still a problem.

I've tried disabling the prefetch and increasing the ARC to 8 Gb with no difference at all.
Really can't figure out why that NVMe is that slow in ZFS, and raw disk VMs on it are working fine.

The process that keeps getting stuck in D state on the host side is txg_sync.
Interesting fact is that the host seems okay, I can wget a big file on proxmox itself without any slow down or freezes, but the exact same wget in the LXC (so on the subvolume) starts freezing up after a few seconds. That's true even if I cd into the subvolume from the host : it seems like the issue is the subvolume, not the LXC.
 
Last edited:
Is there any outcome of this issue?
I am facing the same situation. There is a LXC with a fresh installation of Debian 11.3 with Docker running. The host file system is ZFS.
The first seconds after the docker-compose up -d command everything looks fine but then it gets extremely slow. The download even of smaller docker chunks often freezes for minutes before they continue slowly. There is not a high consumption of memory and the CPUs are running on just 2 to 4 percent so I am wondering what is going on on the system.
I am aware that the vfs that is used by Docker does not have the best performance but the difference to other file system drivers should not be 1:1000.
Is there somebody who knows anything about this topic or who knows any deeper analysis tools or methods to find out what is going on?
 
What disks are you using? Terrible performance is usually caused by using cheap consumer SSD or cheap SMR HDDs. With ZFS you should use enterprise grade SSDs with powerloss protection or CMR HDDs.
 
I am using 2 500 GB SSDs (non enterprise grade) in a mirrored setup. For sure enterprise SDDs would be better but I don´t think that the SSDs are really the root cause in this case.
 
I am using 2 500 GB SSDs (non enterprise grade) in a mirrored setup. For sure enterprise SDDs would be better but I don´t think that the SSDs are really the root cause in this case.
In case they use QLC NAND that would be my first bet. With that it wouldn't be unusual to see HDD-like performance.
 
But it is not even HDD performance. After some more checks I have the impression that it is not Docker or LXC specific because also VMs get extremely slow during I/O operations. However, the health status of the zpool looks fine but I will continue looking at the ZFS side.
 
I experienced the same problem...

Docker on top of a directory on ZFS, which does not use ZFS as it's backing store (docker info does not show ZFS support) is VERY SLOW. It is the worst possible setup type you can have and it slow on design. This is not new and it always have been. Just run Docker on the PVE host (which is the same from a security standpoint in this setup) or in a VM. LX(C) containers are NOT meant to run Docker. It works from a technical standpoint but it is neither fast nor more secure.

This is my personal impression and is NOT directed on specific individuals but only a general observation:
That's the point most users "pro lxc docker" don't get and I'm so sick of this topic in this forum. Only because something does not throw an error or one docker run command works does not imply that's a good idea to do it or in this case a fast setup.
 
  • Like
Reactions: Dunuin
I am experiencing this too. In my case it's quite modern HPE servers stuffed with data-center grade SSDs only. I spent a whole week trying to find storage configuration with at least acceptable performance, but no luck. I came to conclusion that QEMU is bad in terms of disk performance. Why? Since it shows bad performance with RAM drives too. With a RAM drive, there is no controller, and what drive could be faster, after all? Here are the results I got when trying to randomly write 4k blocks to the same RAM disk on the host and in a virtual machine running on that host:
Host: 866,4 MiB/s at 221.8k IOPS
VM: 13.56 MiB/s at 3572,8 IOPS
 
That sounds *very* wrong. Please share how you measured that, and the VM config (qm config VMID)
 
A big step ahead for me was the trim command I executed yesterday.
I compared the pveperf results before and after the trim.
The FSYNCS/SECOND value changed from about 400 to about 1500 and the overall performance is back on a useful level - even without additional zfs performance optimizations.
 
I think I might have figured this out recently. The overlay2 storage driver technically doesn't support ZFS: https://docs.docker.com/storage/storagedriver/select-storage-driver/. I ended up running into other issues like certain directories couldn't be removed (similar https://github.com/moby/moby/issues/15314). I don't know why Docker doesn't complain explicitly when using overlay2 with an unsupport underlying filesystem.

To fix this, I first tried switching to the ZFS storage driver, but that didn't seem to be working for me via LXC. In the end, I ended up creating an XFS-formatted zVol on my ZFS datastore and then giving the mount point to my LXC running docker. I didn't actually measure the perf before & after, but it definitely feels a lot faster.
 
To fix this, I first tried switching to the ZFS storage driver, but that didn't seem to be working for me via LXC.
Yes, best to use Docker on ZFS inside of a VM or directly on your PVE host as I have already ranted about in #8. There is also the ZFS volume driver which I can highly recommend.

Hopefully, LXC and ZFS will get support soon. This is in the works (on the ZFS and the LXC side of things), but no timeline and with low priority.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!