I have been fighting I/O performance issues on our Ceph server for some time. Sometimes the VMs I/O performance is so bad that I have to move the VM image to a local drive in order to get performance back. I'm now exploring other shared storage methods. Running Proxmox 7.1-11. When using the Ceph server as the VM storage, the average load is 6, which drops to 0.5 when switched to local storage.
Hardware
3 Ceph Nodes: 24 x Intel(R) Xeon(R) Gold 6128, 10 x Toshiba MG04SCA20EE 4TB HDD (Ceph Data), 2 x Samsung MZ7KM1T9HMJP-00005 (Ceph DB).
5 Compute Nodes: 80 x Intel(R) Xeon(R) Gold 6230, Single system SSD. There are 7 other empty drive bays.
Networking is all 10Gb between nodes. The cluster is running about 80 various VMs, most I/O intensive things like database operations.
Our first solution was going to be to replace the 30 Toshiba HDDs with SSD, but the quote from our vendor was more than we could get into the budget. So now I'm looking for another shared solution using the compute nodes' empty drive bays.
Any suggestions on what solutions I should look into? I'm thinking of GlusterFS, but want some thoughts before going down that path.
Hardware
3 Ceph Nodes: 24 x Intel(R) Xeon(R) Gold 6128, 10 x Toshiba MG04SCA20EE 4TB HDD (Ceph Data), 2 x Samsung MZ7KM1T9HMJP-00005 (Ceph DB).
5 Compute Nodes: 80 x Intel(R) Xeon(R) Gold 6230, Single system SSD. There are 7 other empty drive bays.
Networking is all 10Gb between nodes. The cluster is running about 80 various VMs, most I/O intensive things like database operations.
Our first solution was going to be to replace the 30 Toshiba HDDs with SSD, but the quote from our vendor was more than we could get into the budget. So now I'm looking for another shared solution using the compute nodes' empty drive bays.
Any suggestions on what solutions I should look into? I'm thinking of GlusterFS, but want some thoughts before going down that path.