Proxmox ceph cluster - high system cpu usage

martinb

New Member
Apr 28, 2014
8
0
1
Hello,

We have recently built a 4 node Proxmox/Ceph cluster and we're having performance issues with it. We have noticed that every ~5s, the system cpu usage goes up and the processes involved seem to be migration/x. We have tried increasing kernel.sched_migration_cost from 500000 to 5000000 but it doesn't make much difference. Any ideas what could be causing this ?

Cluster info (4 servers):

Proxmox 3.4
Kernel 2.6.32-39-pve (2.6.32-156)
Dual E5-2640 v3 (8 physical cores each CPU + HT)
40 Gb Infiniband
128 GB RAM
PERC H730/830 disk controllers
SAS 4TB 7.2k drives
~15 OSDs per server (total 65)
In addition to the Ceph OSDs / monitor, each server runs a 4 cpu, 8 GB ram VM acting as file server.

Thanks,
Martin
 

Attachments

We found that the high sys usage/load was caused by the ceph journal sync. We changed 'filestore max sync interval' from 5s to 30s and it improved things a lot. Load avg decreased from 10-12 to 2-4. Maybe there is some other problem that makes journal sync inefficient.