KRBD on made my VM fly like a rocket...why?

satiel

Member
Jan 13, 2016
11
0
21
38
Hi all,
i decided to give proxmox 4 with kvm a try.
For testing performance with my existing ceph cluster (so a ceph cluster built outside proxmox) i created a 2012R2 VM with KVM and one VIRTIO disk point to my ceph pool, default values.
In the beginning my problem was that i couldn't reach more than 600MB/s read and 230MB7s write all sequential.
But i knew my cluster could do more than that, after further investigations i found KRBD option when mounting a ceph pool.
I made the test again, same settings, now i can achieve 3450MB/s read and 560MB/s write. (seq)
Great!!
The question is...what is that option? why it influences so much performance? is it safe? are there any implications on using it that i haven't see?

Thank you very much.
 
I think the kernel driver is multi-threaded, whereas the kvm driver is single threaded by default. But I would ask the ceph developers for details on that.
 
Hi dietmar thanks for the clarification, but what i don't understand is that on a simple debian install when i connect my ceph pool with the client i don't see any KRBD option, i don't understand what Proxmox KRBD option is actually doing? What configuration changes on the OS?
 
Thanks, reading the articles helped to understand a little more, but the articles don't specify which setting is changed for instructing qemu-kvm to use krbd vs librbd
 
From my benchmark, the big difference between krbd and librbd is cpu usage. (around twice more with librbd)
qemu use only 1 thread by disk, to you can be cpu limited. (don't known which kind of processor you have).

But I'm able to reach 4GB/S with both, and around 70000 iops with 4k block , both krbd or rbd for 1 qemu disk

also with krbd, as you test is sequential, it's quite possible than readahead works better with krbd.


Krbd is a kernel driver, it's a /dev/rbdx device on host, librbd is a library and qemu directly talk with ceph cluster.


you can reduce cpu usage and improve speed with this config in ceph.conf

Code:
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_journaler = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0

the 3 first lines disable cephx authentification (you need to restart all ceph nodes, and all vms)

the other lines are to remove debug counters which use a lot of cpu too
 
Thanks for the explanation! I see lot of people disabling auth and debug feature to improve performance, i always wonder if this is safe to do on a large production environment.
I'm using Xeon E5520, i know it's not brand new cpu but with nmon i haven't see any core on proxmox reaching 100%.
So help me understand, when i use krbd is like when i map the rbd device /dev/rbd0 /dev/rbd1 ecc..
But how you explain the perfomance gap witouth modifying anything on qemu side?
I mean if want tu use /dev/rbd0 usually we format it in someway for example mkfs.xfs and then put a "local" virtual disk on the top of it...

But i haven't touched any qemu settings regarding the vdisk, i even went in ssh and tried to parse all configuration files looking for something different...i mean even with krbd option active qemu should be using librbd in any case, there's a pice of the puzzle missing here :)
 
Thanks for the explanation! I see lot of people disabling auth and debug feature to improve performance, i always wonder if this is safe to do on a large production environment.
yes, it's safe.

I'm using Xeon E5520, i know it's not brand new cpu but with nmon i haven't see any core on proxmox reaching 100%.
So help me understand, when i use krbd is like when i map the rbd device /dev/rbd0 /dev/rbd1 ecc..
But how you explain the perfomance gap witouth modifying anything on qemu side?
I mean if want tu use /dev/rbd0 usually we format it in someway for example mkfs.xfs and then put a "local" virtual disk on the top of it...
the /dev/rbdX device is mapped in the vm, then you do the mkfs.xfs inside your vm.
for the performance gap, well, krbd is kernel , librbd is userland. librbd have new ceph feature more fast than krbd.
Both are totally different code and implementation.
Currently librbd is slower because of memory allocations. (we have already improve that a lot in proxmox, with jemalloc)
But they are still improvement.

But i haven't touched any qemu settings regarding the vdisk, i even went in ssh and tried to parse all configuration files looking for something different...i mean even with krbd option active qemu should be using librbd in any case, there's a pice of the puzzle missing here :)

krbd and librbd are 2 differents implementation of rbd protocol. (no code sharing)
 
Hi Spirit, thanks for your patience,now i get what is the main difference between krbd and librbd and why there's a substantial performance gap.

What i keep not understanding is how you "tell" qemu to use krbd instead of librbd :)

the /dev/rbdX device is mapped in the vm, then you do the mkfs.xfs inside your vm.

When i did the test i just powered on the same Windows Machine that has only one drive, C:\ to be specific.

On the guest level nothing has changed and the machine booted correctly, in the vdisk qemu configuration nothing seem changed not a flag not an argument, maybe i'm stubborn but i really don't understand :)
 
But i haven't touched any qemu settings regarding the vdisk, i even went in ssh and tried to parse all configuration files looking for something different...i mean even with krbd option active qemu should be using librbd in any case, there's a pice of the puzzle missing here :)
Have you tried to compare
Code:
qm showcmd vmid
output in both situations? Maybe is something "injected" on the fly in the runtime stat config of qemu
 
Got it!
the difference:

With KRBD:
-drive file=/dev/rbd/rbd-ssd/vm-800-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect

With Librbd:
-drive file=rbd:rbd-ssd/vm-800-disk-1:mon_host=172.16.10.21;172.16.10.22;172.16.10.23:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/cephssd.keyring,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect


What derailed me completely is that if i try to browse /dev/ i can't find rbd device, i don't know how proxmox is using it, that's why the obvious seemed less obvious :)

Thank you very much! You all saved my day!

***EDIT***
/dev/rbd is mapped only when the vm is powered on the first time that's why i couldn't see it ;)


Still i have no idea why the huge performance gap between the two, it seems that i'm the only one, i will try the setup with a different cpu maybe...
 
i always wonder if this is safe to do on a large production environment.
yes, it's safe.

http://docs.ceph.com/docs/master/rados/configuration/auth-config-ref/

has all you need on this.

Basically safe to do unless you do the following:
  • attacker inside your ceph-network
  • a second ceph-cluster on the same network (you should not do this, EVER)
  • a ceph-cluster doing "CEPH" over the internet (your problems are bigger)
  • data you do not want to leak if someone "takes the ceph-node(s) home with em" (your problems then are probably bigger too)
 
the 3 first lines disable cephx authentification
Hi,
I have a question about this -should proxmox-backup work with this?
I was never been able ...
My Vms started but at backup i get "ERROR: Backup of VM104 failed - no such volume 'rbd-images:vm-104-disk3' "
Is something other neede to get this working?
Thanks!
Markus
 
Hi,
I have a question about this -should proxmox-backup work with this?
I was never been able ...
My Vms started but at backup i get "ERROR: Backup of VM104 failed - no such volume 'rbd-images:vm-104-disk3' "
Is something other neede to get this working?
Thanks!
Markus

you need also to remove the client key in /etc/pve/priv/ceph/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!