Storage for production cluster

JonesHerbet

New Member
Dec 8, 2025
2
0
1
Hello everyone, I am reaching out to you because we are trying to migrate from VMware VSAN to Proxmox.

First, let me give you a quick overview of our current situation. We have a cluster of three nodes (vxRail) with 10 HDDs and two SSDs (vSan in cache tiering) and a 10G network.

I have done several PoCs with Ceph, with 4 OSDs + WAL/DB on SSD, but the performance is very disappointing, probably due to my hardware bottleneck.

For comparison, with 4 OSDs, I can barely reach 600 IOPS, while vSAN with 10 disks per host exceeds 15k (in VM). So yes, there are 6 fewer disks per host, but the difference in ratio is far too high. In addition, we are concerned that in the long term, our hardware will become too weak for Ceph. The tests were performed on the same servers that we currently use in production with VMWARE (same CPU, same disks, etc.).

Why test with 4 OSDs? Simply because that's what we have available :D

In VMs, it's even worse. With diskspd, I can't exceed ~100 Mbps. I've tried all the possible VM settings I found on the forum and Reddit, but nothing works. I improved Ceph's performance a little by setting this option: ceph config set global bluestore_min_alloc_size_hdd 65536, but nothing exceptional.

I know that Ceph scales with 3+ hosts, but we don't plan on adding hosts, so I'm not sure if it's the right choice (I'm starting to think not, given my results, maybe I'm doing something wrong, or maybe I just overshot our infrastructure).

I would like your opinion on what to do, knowing that if we leave VMWare, it's because we can't afford to spend tens or hundreds of thousands of euros.

Would a SAN with multipath work in terms of performance? But I think, if I'm not mistaken, that I would lose the snapshots... and therefore, at the same time, not being able to use PBS to back up our VMs (if I'm not mistaken, it's based on snapshots?) or maybe it doesn't depend on storage, in which case it's fine.

Are there any other viable storage solutions for a production cluster?
We came across Starwind's vSan, but we don't like the idea of having a third-party dependency.

This is our last major obstacle before acquiring our licenses and starting the migration: storage.
 
Last edited:
Hi @JonesHerbet, welcome to the forum.

Since you’re looking to refresh your infrastructure, it’s worth taking a step back and considering a more comprehensive update. Replacing HDDs with NVMe is a good start. You should also review your network capabilities. You didn’t mention your current setup, but if you’re on 10 Gbit, moving to 25 Gbit or higher would provide significant benefits.

You also haven’t provided details about the age of your VMware/VSAN compute hardware. VSAN integrates into ESXi very differently than Ceph integrates with PVE. Ceph can be more resource-intensive, and older hardware may simply not be suitable for it.

There are Ceph experts on this forum who may be willing to donate some time to help assess your infrastructure. Another option is to consult a Proxmox Partner for a more formal evaluation.

SAN storage works very well with PVE. Recent developments (currently in Tech Preview) even allow SAN-backed snapshots. Whether it will meet your performance expectations is hard to say at this point, you haven’t defined your performance goals, and we don’t know what SAN solution you might be considering. Realistically, this is a question best answered by the SAN vendor you choose.

Several external vendors offer storage solutions compatible with PVE. If your requirement is “no third-party dependencies,” then most of these solutions would be disqualified. And, technically, an external SAN is also a third-party component. I’m not an expert on the StarWind, but as far as I know, it runs as a VM inside PVE.

If your preference is a “single throat to choke,” then reaching out to a Proxmox Partner and following their recommendations closely is the right path for a fully supported environment.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S
Been migrating 13G Dells VMware vSphere clusters over to Proxmox Ceph using SAS drives and 10GbE networking on isolated switches. Ceph is scale out solution, so more nodes = more IOPS. Not hurting for IOPS on 5-, 7-, 9-, 11-node clusters. Just like with vSAN, have homogeneous hardware (same CPU, memory, storage, IT/HBA-mode storage controller, NIC, firmware, etc).

I use the following optimizations learned via trial-and-error. YMMV.
Code:
    Set SAS HDD Write Cache Enable (WCE) (sdparm -s WCE=1 -S /dev/sd[x])
    Set VM Disk Cache to None if clustered, Writeback if standalone
    Set VM Disk controller to VirtIO-Single SCSI controller and enable IO Thread & Discard option
    Set VM CPU Type for Linux to 'Host'
    Set VM CPU Type for Windows to 'x86-64-v2-AES' on older CPUs/'x86-64-v3' on newer CPUs/'nested-virt' on Proxmox 9.1
    Set VM CPU NUMA
    Set VM Networking VirtIO Multiqueue to 1
    Set VM Qemu-Guest-Agent software installed and VirtIO drivers on Windows
    Set VM IO Scheduler to none/noop on Linux
    Set Ceph RBD pool to use 'krbd' option
For new installs, get single-socket with lots of cores, faster networking, and NVME storage.
 
Last edited:
Hello, both of you! Thank you for your answers, @bbgeek17 & @jdancer . I'll try to give a little more detail.
We are currently on a 10G network, and an upgrade would represent a significant cost that we cannot afford.

The equipment is about 5-6 years old. In the same case, purchasing NVME represents a large cost (complete infrastructure change).

For the SAN, it will most likely be a PowerVault, but I still need to look into it.

Although deep down I would like to switch to full NVME + 25G network, sometimes we also need to adapt the environment to our actual needs.

@jdancer , I have already seen and applied the settings you suggested, but unfortunately my performance remains poor.

For your information, each of our hosts has the following hardware:

- CPU: 32 x Intel(R) Xeon(R) Gold 6226R CPU
- RAM: 376 GiB
- HDDs: 10xTOSHIBA MG06SCA800EY 8TB 7.2K RPM SAS-12Gbps
- SSD 2xDell - 3TCV6 - 1.6TB - SSD - 2.5" - SAS - 12G - MZILT1T6HBJR0D3
 
You need to really confirm that write cache enable is turned on via 'dmesg -t' output on each drive. If the write/read cache is disabled, it really kills the IOPS.

While technically 3-nodes is indeed the bare minimum for Ceph, I don't consider it production worthy due to fact if you lose 1 node, you're SOL. With 5-nodes, you can still lose 2-nodes and still have quorum.

Needless to say, more OSDs = more IOPS. The 13G Dells I migrate are 16-drive bay R730s and 10-bay R630s. So, each server has 2 small SAS drives mirroring Proxmox with ZFS RAID-1 and rest of the drives are OSDs.

For the storage controller, I'm using Dell HBA330. I swapped out the Dell PERCs since Ceph does NOT work with RAID controllers.

Specs on the 13G Dells are:

2 x E5-2650v4 CPUs
10K RPM SAS drives
Dell rNDC Intel X550 NIC quad
Dell HBA330 IT/HBA-mode storage controller mini-mono
512GB RAM
Arista 10GbE switches (Ceph & Corosync traffic)

I do put the Ceph & Corosync network traffic on the switches using active-backup. Considered best practice? No. Works? Yes. And to make sure this network traffic never gets routed, I use the IPv4 link-local address of 169.254.1.0/24 and set the datacenter migration option to use this network with the insecure option.
 
Ceph is resource intense. like mentioned. Your one can proof other virtualization solution like xcp-ng with xostor or nutanix but they are not so easy to implement like proxmox, my one think.