Bottleneck at 200 MiB/s?! 7 Node Ceph NVME & SSD Cluster

relink

New Member
Aug 22, 2024
24
2
3
Hey friends,

Observing strange performance limitation.

EDIT: ALL SSD & NVME are Enterprise Level! No consumer stuff built in

Backups PBS & Veeam topping out at around 200 MiB/s..

Esp. Veeam is telling, the bottleneck is actually the "source", which is the ceph cluster.

Backup Vault is capable of much more than this (24 x 14 TBs ZFS Z2 volume)

Each node is utilizing MLAG with 2x 25Gbit/s.
VMs using 10Gbit/s NIC virtio.

None of the OSD is showing high latencies or commit/apply issues while reading from ceph.

Did some screenshots, I've no idea what is limiting my transfer speeds...

Who has an idea? :)
 

Attachments

  • osd.png
    osd.png
    161.7 KB · Views: 24
  • pbs.png
    pbs.png
    85.9 KB · Views: 23
  • Screenshot 2025-02-03 125334.png
    Screenshot 2025-02-03 125334.png
    68.8 KB · Views: 24
  • Screenshot 2025-02-03 133825.png
    Screenshot 2025-02-03 133825.png
    41.3 KB · Views: 24
  • Screenshot 2025-02-03 142303.png
    Screenshot 2025-02-03 142303.png
    7.9 KB · Views: 25
Last edited:
Hi,:)


no, you're definitely not alone.
We have a similar setup (6-node cluster + Ceph cluster with NVMEs) and are also getting around 120 - 280 MB/s.

Nodes: 100G
VMs: 10G virtio


There is already an open case with Veeam on this topic, but their analysis of our logs so far was:


"Backup fails due to a bottleneck. Please check your network and storage."

...:rolleyes:

I'm still waiting for further feedback, but if there's any progress or a solution, I'll share an update… unless someone else finds a solution faster.
 
Sorry, but I'm glad not to be alone :D

hmm, it must be somth. with CEPH I'd guess then - its not backup software related, since PBS is having the same performance :/