I have been running Proxmox on a hyper-converged server for 5 years and we have been very happy. We use 95% LXC containers with a mixed storage setup of NVME SSDs and SAS drives on ZFS RAID. When we maintained all of it on a single server, we were very happy with the performance.
The issues have come now when we have grown out of a single server and have tried to add a second one with shared storage. I run a dedicated 10Gbit network between the servers and am trying to have shared storage where all containers can access simultaneously. I have tried to read everything in this forum and the storage documentation but we have ended up with very bad performance for small files. Our needs are very simple from an authentication perspective and wish to have a light-weight solution.
I ended up mounting NFS version 3 on the proxmox hosts from each other and mounting in the LXC containers. When copying a directory of small textfiles (very common for our workload) a simple write of 35000 files of a total of 75 mb takes over 15 minutes! Locally on each server this is near instantaneous, regardless if it is NVME SSD or SAS ZFS RAID and the same within a LXC container or directly on the proxmox host.
Our workload is mixed between these high file-count directories and large multigigabyte image files and they need to be stored together. This is why we have invested in the 10Gbit backbone. The large file transfer is fine almost saturating the network for the NVME SSD Raid, but it is the small files we are struggling with.
I have tested different protocol versions for NFS, 3, 4 and 4.2 with the same results. I have also tried different levels of rsize / wsize all the way from 4096 to max but see no improvements. All I need is solid performance between 2-3 proxmox nodes. I need no interoperability with other clients. I have also tried with and without jumbo frame support on the network.
Am I completely off trying to achieve this with NFS? Should I use ZFS over iSCSI for example instead or anything else? All help is highly appreciated.
All nodes are running Proxmox 6.3-3 for your reference.
The issues have come now when we have grown out of a single server and have tried to add a second one with shared storage. I run a dedicated 10Gbit network between the servers and am trying to have shared storage where all containers can access simultaneously. I have tried to read everything in this forum and the storage documentation but we have ended up with very bad performance for small files. Our needs are very simple from an authentication perspective and wish to have a light-weight solution.
I ended up mounting NFS version 3 on the proxmox hosts from each other and mounting in the LXC containers. When copying a directory of small textfiles (very common for our workload) a simple write of 35000 files of a total of 75 mb takes over 15 minutes! Locally on each server this is near instantaneous, regardless if it is NVME SSD or SAS ZFS RAID and the same within a LXC container or directly on the proxmox host.
Our workload is mixed between these high file-count directories and large multigigabyte image files and they need to be stored together. This is why we have invested in the 10Gbit backbone. The large file transfer is fine almost saturating the network for the NVME SSD Raid, but it is the small files we are struggling with.
I have tested different protocol versions for NFS, 3, 4 and 4.2 with the same results. I have also tried different levels of rsize / wsize all the way from 4096 to max but see no improvements. All I need is solid performance between 2-3 proxmox nodes. I need no interoperability with other clients. I have also tried with and without jumbo frame support on the network.
Am I completely off trying to achieve this with NFS? Should I use ZFS over iSCSI for example instead or anything else? All help is highly appreciated.
All nodes are running Proxmox 6.3-3 for your reference.