[SOLVED] probably - ceph performance graph weirdness

elimus_

Member
Aug 26, 2017
19
1
23
34
Have any other ceph users noticed weirdness with performance graph. Where one read or write
does not seem to reflect real situation? Mine currently shows this and I think that it's a bit off...

Specifically looking at Reads... for +-50 VMs this is weird.
ceph_weird.png


One thing to say, that it was after cloning of VM disk took added a bit to much load on it, but it corrected it self after stopping cloning (will do research on this later on):
Code:
2019-09-11 14:44:25.523765 osd.57 osd.57 172.16.50.139:6800/4603 1883 : cluster [WRN] 23 slow requests, 5 included below; oldest blocked for > 30.008303 secs
2019-09-11 14:44:25.523775 osd.57 osd.57 172.16.50.139:6800/4603 1884 : cluster [WRN] slow request 30.007747 seconds old, received at 2019-09-11 14:43:55.515871: osd_op(client.25082543.0:2315 2.82b9d13 2:c8b9d410:::rbd_object_map.7f2be46b8b4567:head [call lock.assert_locked,call rbd.object_map_update] snapc 0=[] ack+ondisk+write+known_if_redirected e1139) currently waiting for rw locks
2019-09-11 14:44:25.523779 osd.57 osd.57 172.16.50.139:6800/4603 1885 : cluster [WRN] slow request 30.007696 seconds old, received at 2019-09-11 14:43:55.515923: osd_op(client.25082543.0:2316 2.82b9d13 2:c8b9d410:::rbd_object_map.7f2be46b8b4567:head [call lock.assert_locked,call rbd.object_map_update] snapc 0=[] ack+ondisk+write+known_if_redirected e1139) currently waiting for rw locks


Interestingly I don't see any issues on VMs reads or writes or any other signs of issues with storage. And network monitor also shows traffic flowing both ways +- on what I expect between ceph and hypervisor nodes

At the moment thinking on how to proceed with this. As I need to run updates on both ceph and hypervisor nodes possibly reboots after that will fix it...

Any comments?


----------------------------------------------------------------------------

Update, well everything seems to be workig. Maybe reads are really that periodic and low. or those stats are just off...

Also one VM that was collecting logs and analyzing them which was generating majority of reads was pretty much moved to legacy status in our systems. Possibly I had a minor heart attack over nothing...

Either way. Solved, probably as everything seems to work fine.
 
Last edited:
Update, well everything seems to be workig. Maybe reads are really that periodic and low. or those stats are just off...
The stats are cluster wide (see client IO on 'ceph -s' output).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!