Central storage for mixed workload VMs: NFS?

stefan00

Member
Mar 5, 2021
5
0
6
47
We currently plan on implementing a new small-scale cluster for mixed VM workloads. The goal is to be able to add compute nodes as needed while maintaining a central ("shared") storage for all VM images.

That being said, we want to build some sort if small VDI infrastructure with the ability to run Windows and MacOS clients.

Since some OSes require a good disk performance, the question comes to the central storage part. Many resources found today are about ceph. However, building a high performant ceph cluster seems out of our scope since too many nodes are required.

We now got the idea of running all VM disks on a central NFS server. Unfortunately, there seems to be very few information available on what kind of performance we can expect.

Setup example:

- 3+ compute nodes (PVE)
- 1 storage node (NFS server)

Given for storage:

- storage itself is fast enough
- interconnect between compute nodes and storage is decently fast (40/100gbit ethernet)

Given that VMs:

- need good disk speed / IOPs
- able to serve for example 20+ VM disk images on the storage node


Question:

Are there experiences with serving / mounting high performant VM images on NFS shares?

Thank you for helping :)

---

Please note that my question is NOT about
- HA
- Failover
- SPOF
- redundancy
- backend storage design itself
 
Last edited:
We now got the idea of running all VM disks on a central NFS server. Unfortunately, there seems to be very few information available on what kind of performance we can expect.
Most of the performance metrics is reported by vendors in a synthetic environment. Exceptional performance is usually needed in critical workloads where sharing their benchmarking is not encouraged.
- storage itself is fast enough
this is very subjective statement. To be frank its not really meaningful.
- interconnect between compute nodes and storage is decently fast (40/100gbit ethernet)
Sure, there is 200Gbit and 400Gbit now, but saying that interconnect at 40/100 is "decent" - being a bit modest. That said, bandwidth does not guarantee latency, especially when disk storage is involved. You will likely be limited by storage response first, then CPU interrupts on client second.

- need good disk speed / IOPs
- able to serve for example 20+ VM disk images on the storage node
again "good IOPs" is subjective. For VDI you really need sub-ms latency to avoid complaints. 20 VMs is on a very small side, we have customers running thousands of VDI clients, granted its on iSCSI.
Are there experiences with serving / mounting high performant VM images on NFS shares?
Given good network and small scale as described above, you should be ok with almost any modern NFS solution, backed by NVMe disks. That is until you start doing snapshots during business hours.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: stefan00
Given good network and small scale as described above, you should be ok with almost any modern NFS solution, backed by NVMe disks.

Hi bbgeek,

Thank you so much for your fast reply.

I know that the specs and requirements I posted are mostly subjective. I tried to ask as compact as possible - always in the meaning of „in context of the scenario described here“.

However, your answer is just what I was asking for :)



PS, OT:
That is until you start doing snapshots during business hours
I assume you talk about snapshots taken and managed by the „compute nodes“ - not snapshots taken at the storage host itself?
 
I assume you talk about snapshots taken and managed by the „compute nodes“ - not snapshots taken at the storage host itself?
I mean snapshots initiated by PVE against the QCOW disks that you will have to use on the NFS side.
If you would like to use storage backed snapshots (ie entire pool vs per VM), then you could store disk images as raw. Of course then you loose any per VM flexibility. And you would need to drive those snapshots manually.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I mean snapshots initiated by PVE …

I assumed you mean that.

And no, there is no need for snapshot / restore functionality on the compute nodes itself. This stuff can be done at the storage node - which will be zfs based anyway (quite strong bias here ;))
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!