Building a new VM farm ... need some advice.

asincero

New Member
Mar 3, 2023
2
0
1
I’ve set up and admin a 5 node Proxmox cluster for my group at work. I’m not an enterprise IT pro, I’m a software developer that happens to be the “IT guy” for my group. We’re about to move labs, and my boss suggested we upgrade the VM farm while we’re at it. That’s perfect timing because I’ve been wanting to take what I learned from setting up that first VM farm and make a better one without being disruptive to our current workflow. This lab move affords me the perfect opportunity to do so.

One of the things I don’t like about our current farm is how I setup the cluster storage to facilitate fast live migration of virtual machines. The way I have it set up is that I have the VMs stored on an NFS server and it is connected to the cluster over a 10GigE network. This configuration made live migration work great. However, I found that the disk I/O performance of having the VMs live on an NFS fileserver left a lot to be desired. Especially for Windows VMs. Sometimes they were borderline unusable.

Then I saw a Craft Computing video on Youtube talking about Proxmox’s replication features. So I installed a single 8TB SSD in each of the Proxmox nodes as a test, set up a new cluster storage using those SSDs, created a Windows VM on that cluster storage, and then setup replication for that VM across all of the nodes of the cluster. I was actually quite pleased with the results. Live migration was still speedy and I guess the disk I/O performance of the VM was as close to bare metal as I can get. The only downside is that replication would sometimes mysteriously fail, and I haven’t quite figured out the reason yet. I am using an older version of Proxmox (version 7.1), so it might be a bug that has since been fixed.

Even though setting up VM replication is a bit more complicated than simply having the VMs live on a central fileserver, the performance increase makes it worth it. So I was dead set on having just local SSD storage for the next VM farm I get to build. But then I learned about 25GigE networking and the hardware for setting that up isn’t really all that much more expensive than 10GigE.

Now I’m wondering … would a 25GigE network give me the performance I want? Would it be similar enough to local SSD performance that it would make the complications introduced by the replication no longer worth it? What other storage options do I have?
 
Last edited:
Now I’m wondering … would a 25GigE network give me the performance I want?
From a price/performance/availability perspective 25 is the new 10. For a greenfield business deployment nothing less than 25 should be installed in 2023. Having redundant switch infrastructure with 40+ up links , and a few spare 40+ ports is always a plus.

Would it be similar enough to local SSD performance
NFS server will never give you similar enough performance to local SSD. That said, its impossible to say what was the cause of your performance issues without proper benchmarking (network, switches, nfs, local nfs storage, etc). Figuring it out requires time and resources. Additionally, not everything is optimal out of the box, there is always some tuning available.

the complications introduced by the replication
Keep in mind that built-in PVE ZFS replication is async, meaning you are always behind. The range could be from a few IOs to multiple minutes. Can your business applications tolerate 10-15min loss of production data? What if you grow and the rate of change exceeds your ability to replicate in reasonable time? These are just a few things to consider when building a business setup.

In addition to using NFS as central storage, there are also options of iSCSI and NVMe/TCP

If there is a budget behind your project - you can reach out to one of the Proxmox partners for help.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Keep in mind that built-in PVE ZFS replication is async, meaning you are always behind. The range could be from a few IOs to multiple minutes. Can your business applications tolerate 10-15min loss of production data? What if you grow and the rate of change exceeds your ability to replicate in reasonable time? These are just a few things to consider when building a business setup.
Or the complicated setup for each VM in comparison to configuration-at-all but in the beginning for every network storage system (distributed or dedicated shared storage). This would be the killer argument against it. HA-ZFS for ZFS-over-iSCSI is still my favorite (I love ZFS).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!