[SOLVED] OSD high cpu usage

GPLExpert · Jun 20, 2018

Hello,

We have ceph hammer with proxmox 4.4. We are preparaing to upgrade to the last version.
The cluster get 5 nodes with 4 osds (4To by sata disks).

Yesterday, we changed the nvme journal disk to intel ssd with more performances on 2 nodes.

The second node without any vms get a cpu average usage of 25/30% and a delay of 4.
If i move a vm on this node, the cpu usage of the vm become 30% (before on the other node : 6%)

I don't understand why.

Have you an idea for help me please ?

Nico

Alwin · Jun 20, 2018

Please elaborate more on this, as I do not quite understand what happend on which node. And what your general system looks like.

spirit · Jun 20, 2018

GPLExpert said:
Hello,

We have ceph hammer with proxmox 4.4. We are preparaing to upgrade to the last version.
The cluster get 5 nodes with 4 osds (4To by sata disks).

Yesterday, we changed the nvme journal disk to intel ssd with more performances on 2 nodes.

The second node without any vms get a cpu average usage of 25/30% and a delay of 4.
If i move a vm on this node, the cpu usage of the vm become 30% (before on the other node : 6%)

I don't understand why.

Have you an idea for help me please ?

Nico

Same cpu model on both nodes ?

if yes, are you sure that the cpu is running at the maximum frequency ? (check performance/powersave mode in your bios)

GPLExpert · Jun 20, 2018

thanks for help

@spirit : yes all node are the same :

Node1 : ok => 50 vms
Node2 : cpu 30% average used => only 2 vm
Node3 : ok => 20 vms
Node4 : ok => 10 vms
Node5 : ok => 10 vms

if i move a mv from another node to node 2 : the cpu usage of the vm become 30%

We have 10G storage network with bonding (2 network card for each nodes)

Nico

aderumier · Jun 20, 2018

@GPLExpert

can you give me result of "cat /proc/cpuinfo" of a working node, and node2 ?

GPLExpert · Jun 20, 2018

Yes

I upload 2 files.

I can see that cpu usage of a vm : 2% with htop
and 30% in the promox gui...

Very strange

GPLExpert · Jun 21, 2018

I tried to delete osd an recrete them on the node 2

Not better

IOStat no sho difference with working node

On the node 2 with hdparm =>
root@GPL-HV3302:/var/log/ceph# hdparm -t -T /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 2472 MB in 1.99 seconds = 1240.35 MB/sec
Timing buffered disk reads: 1412 MB in 3.00 seconds = 470.04 MB/sec

On a workingnode =>
root@GPL-HV3305:~# hdparm -t -T /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 17236 MB in 1.99 seconds = 8643.32 MB/sec
Timing buffered disk reads: 5452 MB in 3.00 seconds = 1816.41 MB/sec

spirit · Jun 22, 2018

GPLExpert said:
Yes

I upload 2 files.

I can see that cpu usage of a vm : 2% with htop
and 30% in the promox gui...

Very strange

node2: cpu MHz : 415.781
working node: cpu MHz : 2493.906

check your cpu governor configuration in bios,
or try to add

GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=1"

in /etc/default/grub

then #update-grub

and reboot

GPLExpert · Jun 22, 2018

Thanks for your help.

Finally, the problem was the ssd. We changed it and all become ok.

We tested it with
# hdparm -t -T /dev/nvme0n1

So, result was half that a working node

Thanks for you help.

Search

Search

[SOLVED] OSD high cpu usage

GPLExpert

Renowned Member

Alwin

Proxmox Retired Staff

spirit

Distinguished Member

GPLExpert

Renowned Member

aderumier

Renowned Member

GPLExpert

Renowned Member

Attachments

GPLExpert

Renowned Member

spirit

Distinguished Member

GPLExpert

Renowned Member

We value your privacy