Help with very erratic small file sharing performance on 10Gbit dedicated network

TomasB

Member
Oct 10, 2018
5
0
6
46
I have been running Proxmox on a hyper-converged server for 5 years and we have been very happy. We use 95% LXC containers with a mixed storage setup of NVME SSDs and SAS drives on ZFS RAID. When we maintained all of it on a single server, we were very happy with the performance.

The issues have come now when we have grown out of a single server and have tried to add a second one with shared storage. I run a dedicated 10Gbit network between the servers and am trying to have shared storage where all containers can access simultaneously. I have tried to read everything in this forum and the storage documentation but we have ended up with very bad performance for small files. Our needs are very simple from an authentication perspective and wish to have a light-weight solution.

I ended up mounting NFS version 3 on the proxmox hosts from each other and mounting in the LXC containers. When copying a directory of small textfiles (very common for our workload) a simple write of 35000 files of a total of 75 mb takes over 15 minutes! Locally on each server this is near instantaneous, regardless if it is NVME SSD or SAS ZFS RAID and the same within a LXC container or directly on the proxmox host.

Our workload is mixed between these high file-count directories and large multigigabyte image files and they need to be stored together. This is why we have invested in the 10Gbit backbone. The large file transfer is fine almost saturating the network for the NVME SSD Raid, but it is the small files we are struggling with.

I have tested different protocol versions for NFS, 3, 4 and 4.2 with the same results. I have also tried different levels of rsize / wsize all the way from 4096 to max but see no improvements. All I need is solid performance between 2-3 proxmox nodes. I need no interoperability with other clients. I have also tried with and without jumbo frame support on the network.

Am I completely off trying to achieve this with NFS? Should I use ZFS over iSCSI for example instead or anything else? All help is highly appreciated.

All nodes are running Proxmox 6.3-3 for your reference.
 
When copying a directory of small textfiles (very common for our workload) a simple write of 35000 files of a total of 75 mb takes over 15 minutes! Locally on each server this is near instantaneous, regardless if it is NVME SSD or SAS ZFS RAID
What exactly does this mean?
If you access the local server via NFS or directly to the storage?

If you are talking about the second thing you compare apples with nuts.
Traditionally File-Access is costly for small files due to locking operations, especially on the network.
Also don't forget that if you work locally on the filesystem not only are there less layers to cross. You also likely will benefit from filesystem caches etc.

Could you explain what 10Gbit network gear you use, and which switches?

This page seems to be pretty detailed:
https://tldp.org/HOWTO/NFS-HOWTO/performance.html
 
Thank you for the comment. Highly appreciated. As a data scientist, I'm absolutely no network protocol expert, but I'm trying my best to learn. The comparison between local copy and network copy is absolutely not an apple to apple comparison. What I was trying to clarify is that our local storage is performing very well so that there should not be any question regarding ZFS and its intrinsic limitations. This performance is adequate for us.

I understand that there is no free lunch when it comes to shuffling data over the network compared to local. What I'm trying to understand is if I have configured something wrong or if this is the expected and fastest performance possible over a 10Gbit dedicated link. I can handle a drop in write performance of up to an order of magnitude, but this is multiple orders of magnitude lower performance than local and that makes our applications impossible. We are seeing transfer-speeds of down to 40 kbit/s in small-file transfers.

Regarding our network infrastructure. I have tried multiple different versions and see no significant difference. In the most simple scenario, I have been using two X520-DA2 and a DAC cable between them. I have also exchanged the DAC to single-mode fiber (this is the fiber of choice at our university). I have also tried with a Switch in between. Neither has made any change. Here the brand and model of the switch is for me unknown as it is run by the university but it is enterprise-grade equipment.

Do I have any alternative to"file-access" transfers in our scenario with data access in containers with multiple containers accessing the same data?
I understand that block-level access can be faster, but I do not know how I would implement it here. Any suggestions are highly appreciated. Good instructions exist on howto serve VM block-devices over the network but not how to solve this with containers.
 
The x520 are aged but decent cards, once they run current firmware and drivers.
Direct connection are legimite but can have their downsides as well. It all depends on the setup.
There are many reasons why small files can trigger bad performance. Jumbo frames can help but do not always increase performance as expected.
A lot of things do happen in the IP-Stack which can have effects to performance.
Can you explain in detail how your network setup looks like?