[SOLVED] OSD high cpu usage

Jul 18, 2016
22
0
21
France
gplexpert.com
Hello,

We have ceph hammer with proxmox 4.4. We are preparaing to upgrade to the last version.
The cluster get 5 nodes with 4 osds (4To by sata disks).

Yesterday, we changed the nvme journal disk to intel ssd with more performances on 2 nodes.

The second node without any vms get a cpu average usage of 25/30% and a delay of 4.
If i move a vm on this node, the cpu usage of the vm become 30% (before on the other node : 6%)

I don't understand why.

Have you an idea for help me please ?

Nico
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
4,617
443
88
Please elaborate more on this, as I do not quite understand what happend on which node. And what your general system looks like.
 

spirit

Famous Member
Apr 2, 2010
5,324
521
133
www.odiso.com
Hello,

We have ceph hammer with proxmox 4.4. We are preparaing to upgrade to the last version.
The cluster get 5 nodes with 4 osds (4To by sata disks).

Yesterday, we changed the nvme journal disk to intel ssd with more performances on 2 nodes.

The second node without any vms get a cpu average usage of 25/30% and a delay of 4.
If i move a vm on this node, the cpu usage of the vm become 30% (before on the other node : 6%)

I don't understand why.

Have you an idea for help me please ?

Nico

Same cpu model on both nodes ?

if yes, are you sure that the cpu is running at the maximum frequency ? (check performance/powersave mode in your bios)
 
Jul 18, 2016
22
0
21
France
gplexpert.com
thanks for help

@spirit : yes all node are the same :

Node1 : ok => 50 vms
Node2 : cpu 30% average used => only 2 vm
Node3 : ok => 20 vms
Node4 : ok => 10 vms
Node5 : ok => 10 vms

if i move a mv from another node to node 2 : the cpu usage of the vm become 30%

We have 10G storage network with bonding (2 network card for each nodes)

Nico
 
Jul 18, 2016
22
0
21
France
gplexpert.com
I tried to delete osd an recrete them on the node 2

Not better

IOStat no sho difference with working node

On the node 2 with hdparm =>
root@GPL-HV3302:/var/log/ceph# hdparm -t -T /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 2472 MB in 1.99 seconds = 1240.35 MB/sec
Timing buffered disk reads: 1412 MB in 3.00 seconds = 470.04 MB/sec

On a workingnode =>
root@GPL-HV3305:~# hdparm -t -T /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 17236 MB in 1.99 seconds = 8643.32 MB/sec
Timing buffered disk reads: 5452 MB in 3.00 seconds = 1816.41 MB/sec
 

spirit

Famous Member
Apr 2, 2010
5,324
521
133
www.odiso.com
Yes

I upload 2 files.

I can see that cpu usage of a vm : 2% with htop
and 30% in the promox gui...

Very strange

node2: cpu MHz : 415.781
working node: cpu MHz : 2493.906

check your cpu governor configuration in bios,
or try to add

GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=1"

in /etc/default/grub

then #update-grub

and reboot
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!