Benchmark: 3 node AMD EPYC 7742 64-Core, 512G RAM, 3x3 6,4TB Micron 9300 MAX NVMe

Alwin · Oct 16, 2020

Rainerle said:
So I believe there is nothing left to change on the configuration that would further improve the performance.

One thing you could attempt, would be to use relaxed ordering. That needs to be set in the BIOS and on the Mellonx cards. Yet, on our system that didn't yield any benefit. But I assume that for one the CPU doesn't have enough cores per complex and that our 100 GbE cards are ConnectX-4.
https://hpcadvisorycouncil.atlassia...ing+Guide+for+InfiniBand+HPC#Relaxed-Ordering

Rainerle · Oct 20, 2020

Benchmark script drop for future reference:
Resides in /etc/pve and is started on all nodes using
bash /etc/pve/radosbench.sh

Code:

#!/bin/bash
LOGDIR=/root
exec >$LOGDIR/$(basename $0 .sh)-$(date +%F-%H_%M).log
exec 2>$LOGDIR/$(basename $0 .sh)-$(date +%F-%H_%M).err
BLOCKSIZES="4M 64K 8K 4K"
for BS in $BLOCKSIZES; do
    TEST="rados bench 600 --pool ceph-proxmox-VMs write --run-name $(hostname) -t 16 --no-cleanup -b $BS"
    echo ${TEST}
    eval ${TEST}
    sleep 120
    TEST="rados bench 600 --pool ceph-proxmox-VMs seq --run-name $(hostname) -t 16"
    echo ${TEST}
    eval ${TEST}
    sleep 120
done

Rainerle · Oct 21, 2020

@Alwin : I am rebuilding the three nodes again and again using ansible. On each new Deploy I reissue the license as I want to use the Enterprise Repository. After the reissue it takes some time to be able to activate the license in the systems again and it also takes some time until the Enterprise Repository allows to login again.
What are save times to wait here?

Yesterday the reissue took only a few seconds but Enterprise Repository access took about 5 minutes. Currently I am waiting for over 10 minutes for the reissue already...

Alwin · Oct 21, 2020

Rainerle said:
@Alwin : I am rebuilding the three nodes again and again using ansible. On each new Deploy I reissue the license as I want to use the Enterprise Repository. After the reissue it takes some time to be able to activate the license in the systems again and it also takes some time until the Enterprise Repository allows to login again.
What are save times to wait here?

Well, just don't.

Packages from the pve-no-subscription repository will mostly land in the pve-enterprise. It's the most widely used repository and if no issues arise the package gets pushed to pve-enterprise.

Rainerle said:
Yesterday the reissue took only a few seconds but Enterprise Repository access took about 5 minutes. Currently I am waiting for over 10 minutes for the reissue already...

At some point a reissue will not be possible and it has to be manually unlocked.

Rainerle · Oct 22, 2020

So I updated the Zabbix templates used for the Proxmox nodes and switched to Grafana to render additional graphs. We do have single CPU threads graphs and NVMe utilization percentage over all three nodes and items in one graph.

This is a benchmark run with 4 OSDs per NVMe.

Order is

4M blocksize write (10min)
4M blocksize read
64K blocksize write (10min)
64K blocksize read
8K blocksize write (10min)
8K blocksize read
4K blocksize write (10min)
4K blocksize read

All 8 tests are bound by the maximum performance of the NVMes (almost always 100% utilization). The "CPU usage per CPU thread" shows spikes of up to 80% during 4M blocksize reads.

Here a benchmark run with 2 OSDs per NVMe:

Again the NVMe utilization rate is 100%. Here the 4M read causes 100% CPU spikes. But the throughput and IOps is almost as good as the 4 OSDs per NVMe result.

Clearly the NVMes are the limiting factor of our environment. We still do have 7 slots available - if we increase the number in the future by using 4 OSDs per NVMe the CPU might become the limiting factor. Therefore we decided to limit the CPU usage by using 2 OSDs per NVMe.

jsterr · Nov 3, 2020

Did you use AMD Tuning Guide thats referenced in Proxmox forums post? Can u share concrete settings and details you have changed in your system (BIOS settings), OS-Settings etc. Thanks for your reply.

Rainerle · Nov 3, 2020

The ThomasKrenn RA1112 1HE pizza box uses an Asus KRPA-U16 motherboard which runs on an AMI BIOS.
The only settings I changed are:
- Pressed F5 for Optimized Defaults
- Disabled CSM support (we only use UEFI)

We wanted to benchmark to compare results and identify problems in the setup. We did not tune for maximum performance at the risk of decreased stablilty or increased power usage. So no overclocking or fixed speeds for memory chips or CPU frequencies.

We use cpupower to set the governor on the OS to performance though.

Byron · Dec 4, 2020

It looks like the amount of OSDs per NVMe does not influence the results too much then?
I'm looking to run similar drives at 1 OSD per NVMe to save CPU power (64C/128T for 20-24 drives).

jsterr · Jul 28, 2021

@Rainerle this looks like Grafana Dashboards for Proxmox Ceph HCI Nodes. Is there any possibility that u share the dashboards? Is this all data promoted via Metric-Server Integration via PVE? Thanks

Rainerle · Jul 28, 2021

jsterr said:
@Rainerle this looks like Grafana Dashboards for Proxmox Ceph HCI Nodes. Is there any possibility that u share the dashboards? Is this all data promoted via Metric-Server Integration via PVE? Thanks

The data for these graphs is collected by Zabbix agents into a Zabbix DB. From there I used the Zabbix plugin in Grafana. Our decision to use Zabbix is 10 years old and we moved away from Nagios. As long as we are still able to monitor everything (really everything!) in Zabbix we do not even look at other solutions.

Search

Search

Benchmark: 3 node AMD EPYC 7742 64-Core, 512G RAM, 3x3 6,4TB Micron 9300 MAX NVMe

Alwin

Proxmox Retired Staff

Rainerle

Renowned Member

Rainerle

Renowned Member

Alwin

Proxmox Retired Staff

Rainerle

Renowned Member

jsterr

Renowned Member

Rainerle

Renowned Member

Byron

Member

jsterr

Renowned Member

Rainerle

Renowned Member