Best storage solution for Proxmox Cluster ?

Gabin

New Member
May 17, 2024
1
0
1
France
I’m currently working for a company that runs a vCenter cluster using 1 Gb/s iSCSI storage on Synology SANs. Everything works perfectly snapshots, thin provisioning, multipath, etc. We’re now looking to migrate to Proxmox, and the big question is: how can we achieve the same storage capabilities under Proxmox?

Current setup
5-node Proxmox cluster
Around 650 GB RAM total
24 cores per node
1 Gb/s network (yes, I know…)
Mechanical SAS/SATA drives (7.2K / 15K RPM)
No SSDs available

Based on the Proxmox storage documentation, only two storage options seem to meet our needs:
- Shared storage
- Native snapshot support
- Thin provisioning

Those two options are:
- Ceph
- ZFS over iSCSI

Ceph :

I tried setting up a small Ceph cluster with 12 OSDs (2 TB SATA 7200 RPM each). The result: very poor performance, especially on writes and IOPS. The 1 Gb/s network wasn’t the main bottleneck it’s clearly the mechanical disks. While Ceph does technically meet our requirements (shared, snapshots, thin provisioning), it’s simply not viable without SSDs. It feels like overkill for our hardware.

ZFS over iSCSI :
I also tested ZFS over iSCSI, using a Debian server with targetcli (LIO) as the iSCSI provider. The performance is much better than Ceph, and close to what we had on vCenter. However, there are too many persistent issues:
- Cloning VMs fails with errors like GET_LBA_STATUS
- Some operations are very slow
These issues are well-known and unresolved for years:
https://bugzilla.proxmox.com/show_bug.cgi?id=4046
https://forum.proxmox.com/threads/e...est-5-ascq-invalid_field_in_cdb-0x2400.95416/
https://forum.proxmox.com/threads/lsi-sas2308-scsi-controller-unsupported-sa-0x12.78785/

It seems the root cause is with LIO, which doesn't properly handle certain SCSI commands expected by Proxmox.

My question :
I’m trying to build a shared, snapshot-compatible, thin-provisioned storage solution that works with limited hardware (mechanical drives, no SSDs, 1 Gb network).

Has anyone managed to achieve this kind of setup reliably?
Are there any viable alternatives to Ceph or ZFS over iSCSI?
Is there any way to reuse my Synology SANs with Proxmox in a proper way ? (NFS is too slow for our usage)


I’d really appreciate any feedback !

Thanks in advance!
 
Hi @Gabin , welcome to the forum.

Yes, you can create what you are looking for and stick with 1gbit network and less performant storage technologies.

Try one of the many OCFS related tutorials, here on the forum and elsewhere.

However, perhaps you should have a conversation with the management about accumulating technological debt.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S
At work, been migrating VMware clusters over to Proxmox Ceph clusters.

These clusters are using 10K SAS drives. One of the issues was why IOPS was real bad.

Found out the SAS drives are configured to use a BBU HW RAID controller with the SAS drive write cache disabled because it was connected RAID controller which already has a cache on it.

Since Ceph doesn't work with RAID controllers, swapped out the RAID controller for a HBA. I also enabled the write cache on the SAS drives. Now, I'm not lacking for IOPS on SAS drives.

At minimum, you'll want Ceph to use 10GbE networking. Higher is better, obviously.

I use the following optimizations learned through trial-and-error. YMMV.

Code:
    Set SAS HDD Write Cache Enable (WCE) (sdparm -s WCE=1 -S /dev/sd[x])
    Set VM Disk Cache to None if clustered, Writeback if standalone
    Set VM Disk controller to VirtIO-Single SCSI controller and enable IO Thread & Discard option
    Set VM CPU Type to 'Host'
    Set VM CPU NUMA on servers with 2 or more physical CPU sockets
    Set VM Networking VirtIO Multiqueue to 1
    Set VM Qemu-Guest-Agent software installed and VirtIO drivers on Windows
    Set VM IO Scheduler to none/noop on Linux
    Set Ceph RBD pool to use 'krbd' option
 
  • Like
Reactions: Johannes S
Although those are changes (may) increase performance, they have drawbacks:

Set SAS HDD Write Cache Enable (WCE) (sdparm -s WCE=1 -S /dev/sd[x])
Bad in case of a power loss if the data has not been written to the disk but only to the cache. I don't know if spinning rust has PLP (I would assume no). Buy only SSDs with PLP.

Set VM IO Scheduler to none/noop on Linux
In combination with disk cache ok, but in general bad on spinning disks if you use the more consistent disable disk cache setting, but optimal for SSDs with PLP

Set VM CPU Type to 'Host'
goot performance but may have problems with live migration.

Set VM CPU NUMA on servers with 2 or more physical CPU sockets
and also with single multi-chiplet CPUs, e.g. all Ryzen and newest mega-multi-core Intel. Also needs more fine tuning.
 
  • Like
Reactions: Johannes S