Hi all,
The past couple of days I have been searching and experimenting to figure out why it seems I am not getting the expected performance out of the SSDs in our Proxmox machines.
I run Ceph over 3 nodes with each 5 OSDs based on Samsung PM863a 1.92TB SSD's.
The nodes are 2x Dell R630 and 1x R640. All have a decent amount of RAM and run the HBA330 controller. CPUs are E5-2680 v4 and Gold 6142.
According to their spec, the SSDs should get seq read/write around 500MB/s. When testing them on a windows PC with Crystaldiskmark I easily get expected speeds.
However, when testing in Proxmox using fio I am getting rw speed of around 100MB/s and 28k IOPS with a single job. Pushing the fio jobs to 4 or more gets read speeds of 270MB/s and +/- 68K IOPS.
These tests are all just on the local SSDs.
At first I thought the issue to be the HBA330, so I tested on another R630 with the H730P. Performance was exactly the same.
When testing benchmarks on Ceph itself, I am getting avg r/w speeds of 300-400MB/s. That's less than the rated speed of 1 SSD.
The nodes have 2x10gbit nics dedicated to Ceph networking. They are connected via an LACP/MLAG set of two switches with hash policy 3+4.
I know 10gbit can easily be a bottleneck, but I'm not even reaching the speeds for this to be the issue. Local direct-disk testing doesn't even yield the expected performance.
I have tested many ideas I already found while searching this an other forums and reading whitepapers, guides etc.
I tried playing with the performance mode/plan on the R630 but that made no difference.
Both the R640 (which has slightly newer CPUs) and the R630 give the same benchmark result.
In multiple places I read these PM863a SSDs should be reasonably fast. Are my expectations too high? Something does not feel right, but I am close to giving up on this search.
The past couple of days I have been searching and experimenting to figure out why it seems I am not getting the expected performance out of the SSDs in our Proxmox machines.
I run Ceph over 3 nodes with each 5 OSDs based on Samsung PM863a 1.92TB SSD's.
The nodes are 2x Dell R630 and 1x R640. All have a decent amount of RAM and run the HBA330 controller. CPUs are E5-2680 v4 and Gold 6142.
According to their spec, the SSDs should get seq read/write around 500MB/s. When testing them on a windows PC with Crystaldiskmark I easily get expected speeds.
However, when testing in Proxmox using fio I am getting rw speed of around 100MB/s and 28k IOPS with a single job. Pushing the fio jobs to 4 or more gets read speeds of 270MB/s and +/- 68K IOPS.
These tests are all just on the local SSDs.
At first I thought the issue to be the HBA330, so I tested on another R630 with the H730P. Performance was exactly the same.
When testing benchmarks on Ceph itself, I am getting avg r/w speeds of 300-400MB/s. That's less than the rated speed of 1 SSD.
The nodes have 2x10gbit nics dedicated to Ceph networking. They are connected via an LACP/MLAG set of two switches with hash policy 3+4.
I know 10gbit can easily be a bottleneck, but I'm not even reaching the speeds for this to be the issue. Local direct-disk testing doesn't even yield the expected performance.
I have tested many ideas I already found while searching this an other forums and reading whitepapers, guides etc.
I tried playing with the performance mode/plan on the R630 but that made no difference.
Both the R640 (which has slightly newer CPUs) and the R630 give the same benchmark result.
In multiple places I read these PM863a SSDs should be reasonably fast. Are my expectations too high? Something does not feel right, but I am close to giving up on this search.