Central storage for mixed workload VMs: NFS?

stefan00 · Mar 19, 2024

We currently plan on implementing a new small-scale cluster for mixed VM workloads. The goal is to be able to add compute nodes as needed while maintaining a central ("shared") storage for all VM images.

That being said, we want to build some sort if small VDI infrastructure with the ability to run Windows and MacOS clients.

Since some OSes require a good disk performance, the question comes to the central storage part. Many resources found today are about ceph. However, building a high performant ceph cluster seems out of our scope since too many nodes are required.

We now got the idea of running all VM disks on a central NFS server. Unfortunately, there seems to be very few information available on what kind of performance we can expect.

Setup example:

- 3+ compute nodes (PVE)
- 1 storage node (NFS server)

Given for storage:

- storage itself is fast enough
- interconnect between compute nodes and storage is decently fast (40/100gbit ethernet)

Given that VMs:

- need good disk speed / IOPs
- able to serve for example 20+ VM disk images on the storage node

Question:

Are there experiences with serving / mounting high performant VM images on NFS shares?

Thank you for helping

---

Please note that my question is NOT about
- HA
- Failover
- SPOF
- redundancy
- backend storage design itself

bbgeek17 · Mar 19, 2024

stefan00 said:
We now got the idea of running all VM disks on a central NFS server. Unfortunately, there seems to be very few information available on what kind of performance we can expect.

Most of the performance metrics is reported by vendors in a synthetic environment. Exceptional performance is usually needed in critical workloads where sharing their benchmarking is not encouraged.

stefan00 said:
- storage itself is fast enough

this is very subjective statement. To be frank its not really meaningful.

stefan00 said:
- interconnect between compute nodes and storage is decently fast (40/100gbit ethernet)

Sure, there is 200Gbit and 400Gbit now, but saying that interconnect at 40/100 is "decent" - being a bit modest. That said, bandwidth does not guarantee latency, especially when disk storage is involved. You will likely be limited by storage response first, then CPU interrupts on client second.

stefan00 said:
- need good disk speed / IOPs
- able to serve for example 20+ VM disk images on the storage node

again "good IOPs" is subjective. For VDI you really need sub-ms latency to avoid complaints. 20 VMs is on a very small side, we have customers running thousands of VDI clients, granted its on iSCSI.

stefan00 said:
Are there experiences with serving / mounting high performant VM images on NFS shares?

Given good network and small scale as described above, you should be ok with almost any modern NFS solution, backed by NVMe disks. That is until you start doing snapshots during business hours.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

stefan00 · Mar 19, 2024

bbgeek17 said:
Given good network and small scale as described above, you should be ok with almost any modern NFS solution, backed by NVMe disks.

Hi bbgeek,

Thank you so much for your fast reply.

I know that the specs and requirements I posted are mostly subjective. I tried to ask as compact as possible - always in the meaning of „in context of the scenario described here“.

However, your answer is just what I was asking for

—

PS, OT:

bbgeek17 said:
That is until you start doing snapshots during business hours

I assume you talk about snapshots taken and managed by the „compute nodes“ - not snapshots taken at the storage host itself?

bbgeek17 · Mar 19, 2024

stefan00 said:
I assume you talk about snapshots taken and managed by the „compute nodes“ - not snapshots taken at the storage host itself?

I mean snapshots initiated by PVE against the QCOW disks that you will have to use on the NFS side.
If you would like to use storage backed snapshots (ie entire pool vs per VM), then you could store disk images as raw. Of course then you loose any per VM flexibility. And you would need to drive those snapshots manually.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

stefan00 · Mar 19, 2024

bbgeek17 said:
I mean snapshots initiated by PVE …

I assumed you mean that.

And no, there is no need for snapshot / restore functionality on the compute nodes itself. This stuff can be done at the storage node - which will be zfs based anyway (quite strong bias here

)

Search

Search

Central storage for mixed workload VMs: NFS?

stefan00

Member

bbgeek17

Distinguished Member

stefan00

Member

bbgeek17

Distinguished Member

stefan00

Member