Proxmox VE Ceph Benchmark 2020/09 - hyper-converged with NVMe

Jesus Blanco · Jul 29, 2021

Rainerle said:
As Ceph uses a 4M "block size" I would rather test around changing the NVMe's blocksize from 512K to 4M.

In my Intel SSDPE2KX080T8 NVMe disks I see two LBA format:
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good
LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)

You have wrote 512K to 4M, is that correct or did you mean 512B to 4KB? I can't find anything about 4M LBA Format.

Thanks again!

Rainerle · Jul 30, 2021

Jesus Blanco said:
In my Intel SSDPE2KX080T8 NVMe disks I see two LBA format:
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good
LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)

You have wrote 512K to 4M, is that correct or did you mean 512B to 4KB? I can't find anything about 4M LBA Format.

Thanks again!

I only found out about msecli from this ZFS benchmark thread and back then had not considered it for my benchmarks.
So yes, I was wrong - it should be 4KB NVMe block size.
And the default Ceph block size is 4MB - no idea if Proxmox does changes to the RBDs here.

Alwin Antreich · Jul 31, 2021

Rainerle said:
So yes, I was wrong - it should be 4KB NVMe block size.

Some NVMe allow to set 4 MB allocation size, not the Micron 9300 though. And it doesn't make any difference if it is 4 KB or 512 B, not with Ceph anyway.

Rainerle said:
And the default Ceph block size is 4MB - no idea if Proxmox does changes to the RBDs here.

It's vanilla as it gets. You can try to change the stripe size to get better write performance with sacrificing some read performance though.

jsterr · Jan 19, 2022

How did you benchmarking rados bench reads? Did you create writes on each host with its own run-name and then rados bench read with that run-names specified? Each client (read) has its own run-names (writes) to read from?

Does it make a difference if two rados clients (read) use the same run-name (-> data read is the same for both clients)?

aaron · Jan 19, 2022

jsterr said:
How did you benchmarking rados bench reads? Did you create writes on each host with its own run-name and then rados bench read with that run-names specified? Each client (read) has its own run-names (writes) to read from?

Does it make a difference if two rados clients (read) use the same run-name (-> data read is the same for both clients)?

IIRC first the write benchmarks with the --no-cleanup option. Both times with their unique name per node with --run-name.

Not sure if there is much of a difference, but I assume it could have a bit of an effect if two clients use the same data. Since we benchmarked writes before, we wanted to have them separate anyway.

jsterr · Jan 19, 2022

aaron said:
IIRC first the write benchmarks with the --no-cleanup option. Both times with their unique name per node with --run-name.

Not sure if there is much of a difference, but I assume it could have a bit of an effect if two clients use the same data. Since we benchmarked writes before, we wanted to have them separate anyway.

Seems like it does not really make a difference (we tested it). We did 4x 25 Gbit Meshed Round Robin with 3 Nodes and got that values:

6x (2 per node) rados bench 60 seq -t 16 -p vm_nvme --run-name UNIQUE-NAME
SUM: 13.8 GB/s

6x (2 per Node) rados bench 60 write -b 4M -t 16 --no-cleanup -p vm_nvme --run-name UNIQUE-NAME
SUM: 4.8 GB/s

Search

Search

Proxmox VE Ceph Benchmark 2020/09 - hyper-converged with NVMe

Jesus Blanco

Active Member

Rainerle

Renowned Member

Alwin Antreich

Well-Known Member

jsterr

Renowned Member

aaron

Proxmox Staff Member

jsterr

Renowned Member