Been migrating Dell VMware clusters over to Proxmox Ceph clusters.
As with VMware, Ceph really, really wants homogeneous hardware, ie, same CPU, memory, storage, storage controller, firmware, networking.
While it's true, 3-nodes is the minimum for quorum, you really want 5-nodes, so can lose 2-nodes and still have quorum.
As with ZFS, Ceph does NOT work with RAID controllers, so need to get a IT-mode/HBA-mode storage controller. For me, it's a Dell HBA330.
Minimum networking is 10GbE but you really want higher bandwidth for replication/balancing/cluster network traffic.
Ceph is a scale-out solution, so more nodes/OSDs = more IOPS.
I do see a big difference in IOPS between 3-nodes vs 7-nodes.
By default, Ceph storage use replication, so only one-third of storage is only available. Ceph does support erasure coding (EC). With Ceph Tentacle, I can use EC for VMs now.
I use the following optimizations learned through trial-and-error. YMMV.
Code:
Set SAS HDD Write Cache Enable (WCE) (sdparm -s WCE=1 -S /dev/sd[x])
Set VM Disk Cache to None if clustered, Writeback if standalone
Set VM Disk controller to VirtIO-Single SCSI controller and enable IO Thread & Discard option
Set VM CPU Type for Linux to 'Host'
Set VM CPU Type for Windows to 'x86-64-v2-AES' on older CPUs/'x86-64-v3' on newer CPUs/'nested-virt' on Proxmox 9.1
Set VM CPU NUMA
Set VM Networking VirtIO Multiqueue to 1
Set VM Qemu-Guest-Agent software installed and VirtIO drivers on Windows
Set VM IO Scheduler to none/noop on Linux
Set Ceph RBD pools to use 'krbd' option
Set Ceph 'bluestore_prefer_deferred_size_hdd = 0' in osd stanza in /etc/pve/ceph.conf for SAS HDD
Set Ceph 'bluestore_min_alloc_size_hdd = 65536' in osd stanza in /etc/pve/ceph.conf for SAS HDD
Set Ceph Erasure Coding profiles to 'plugin=ISA' & 'technique=reed_sol_van'
Set Ceph Erasure Coding profiles to 'stripe_unit=65536' for SAS HDD