[SOLVED] OSD high cpu usage

GPLExpert

Active Member
Jul 18, 2016
24
0
41
France
gplexpert.com
Hello,

We have ceph hammer with proxmox 4.4. We are preparaing to upgrade to the last version.
The cluster get 5 nodes with 4 osds (4To by sata disks).

Yesterday, we changed the nvme journal disk to intel ssd with more performances on 2 nodes.

The second node without any vms get a cpu average usage of 25/30% and a delay of 4.
If i move a vm on this node, the cpu usage of the vm become 30% (before on the other node : 6%)

I don't understand why.

Have you an idea for help me please ?

Nico
 
Please elaborate more on this, as I do not quite understand what happend on which node. And what your general system looks like.
 
Hello,

We have ceph hammer with proxmox 4.4. We are preparaing to upgrade to the last version.
The cluster get 5 nodes with 4 osds (4To by sata disks).

Yesterday, we changed the nvme journal disk to intel ssd with more performances on 2 nodes.

The second node without any vms get a cpu average usage of 25/30% and a delay of 4.
If i move a vm on this node, the cpu usage of the vm become 30% (before on the other node : 6%)

I don't understand why.

Have you an idea for help me please ?

Nico

Same cpu model on both nodes ?

if yes, are you sure that the cpu is running at the maximum frequency ? (check performance/powersave mode in your bios)
 
thanks for help

@spirit : yes all node are the same :

Node1 : ok => 50 vms
Node2 : cpu 30% average used => only 2 vm
Node3 : ok => 20 vms
Node4 : ok => 10 vms
Node5 : ok => 10 vms

if i move a mv from another node to node 2 : the cpu usage of the vm become 30%

We have 10G storage network with bonding (2 network card for each nodes)

Nico
 
Yes

I upload 2 files.

I can see that cpu usage of a vm : 2% with htop
and 30% in the promox gui...

Very strange
 

Attachments

  • node2.txt
    19.1 KB · Views: 5
  • working_node.txt
    18.3 KB · Views: 4
Last edited:
I tried to delete osd an recrete them on the node 2

Not better

IOStat no sho difference with working node

On the node 2 with hdparm =>
root@GPL-HV3302:/var/log/ceph# hdparm -t -T /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 2472 MB in 1.99 seconds = 1240.35 MB/sec
Timing buffered disk reads: 1412 MB in 3.00 seconds = 470.04 MB/sec

On a workingnode =>
root@GPL-HV3305:~# hdparm -t -T /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 17236 MB in 1.99 seconds = 8643.32 MB/sec
Timing buffered disk reads: 5452 MB in 3.00 seconds = 1816.41 MB/sec
 
Yes

I upload 2 files.

I can see that cpu usage of a vm : 2% with htop
and 30% in the promox gui...

Very strange

node2: cpu MHz : 415.781
working node: cpu MHz : 2493.906

check your cpu governor configuration in bios,
or try to add

GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=1"

in /etc/default/grub

then #update-grub

and reboot
 
Thanks for your help.

Finally, the problem was the ssd. We changed it and all become ok.

We tested it with
# hdparm -t -T /dev/nvme0n1

So, result was half that a working node

Thanks for you help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!