Proxmox 5.0 more resource hungry than 4.4

michaelvv

Renowned Member
Oct 9, 2008
103
3
83
After "upgraded" from Proxmox 4.4 to 5.0 this morning, I had some issues with streaming DSD over my
network to my very complicated chord 2qute DAC. I have these stops in the music 1-2 secs now and then
and I wrote some software to dig into the problem.

I find out that the "mostly proxmox services" are eating up quite some cpu usage so I did these measures, and I can see that these "services" are using more resources than running all my virtualization on the box

99.36 % idle run time 15 min , no load at all
99.09 % idle run time 15 min , full 1 kvm + 4 lxc running containers.
98.74 % idle run time 15 min , full 1 kvm + 4 lxc running containers and the 10 mostly proxmox services.

Running these services took 0.35 % of my CPU compared to the virtualization 0.27 %.

This is just enough to make my DAC randomly fall outs. My solution was just to stop these services, so no
fancy proxmox functionalities as I have under 4.4.

Script for services I stopped.

#!/bin/bash

pvestatd stop
service pve-firewall stop
service pvefw-logger stop
service pve-ha-crm stop
service pve-ha-lrm stop
service pveproxy stop
service watchdog-mux stop
service zed stop
service rrdcached stop
service spiceproxy stop

And to my unsolved question, which of these are the cpu hogs.

As a programmer I think some optimization could be in place here.
 
99.36 % idle run time 15 min , no load at all
99.09 % idle run time 15 min , full 1 kvm + 4 lxc running containers.
98.74 % idle run time 15 min , full 1 kvm + 4 lxc running containers and the 10 mostly proxmox services.

Running these services took 0.35 % of my CPU compared to the virtualization 0.27 %.
This is just enough to make my DAC randomly fall outs.

This seems like a scheduling problem not a load problem, i.e. latency is the problems not the "high" load of 0.35% (meaning that over 99% are available).
Audio profits from a deadline oriented scheduling i.e. realtime scheduling.

My solution was just to stop these services, so no
fancy proxmox functionalities as I have under 4.4.

should be needless.
I'd suggest to rather adapt the priority and/or scheduling characteristic of the processes which are responsible for the audio streaming/processing.

I suggest reading:
Code:
man 7 sched
man 1 chrt

Dependent of where the service runs it I'd adapt the VM and it process in it or the CT and it process.

Script for services I stopped.

#!/bin/bash

pvestatd stop
service pve-firewall stop
service pvefw-logger stop
service pve-ha-crm stop
service pve-ha-lrm stop
service pveproxy stop
service watchdog-mux stop
service zed stop
service rrdcached stop
service spiceproxy stop

I suggest:
Code:
systemctl disable pve-ha-crm pve-ha-lrm watchdog-mux spiceproxy pvesr.timer  # now those services won't even start on the next boot, so no need to call any script on each boot
systemctl stop pve-ha-crm pve-ha-lrm watchdog-mux spiceproxy pvesr.timer # to stop it for the current boot session immediately too
I guess you run a single node home lab setup, so not much need for HA and its watchdog.
I added spiceproxy for good measure, it's allows to access the PVE host shell and CTs over spice too, VMs are even unaffected from disabling this.
Oh and pvesr (PVE storage replication) is a service which gets triggered every second in not an ideal way (tbh) AND is new in PVE 5, so this could be the change you searched.
Its periodic startup could trigger latency spikes...

If you do not use the firewall you may want to add "pve-firewall pvefw-logger" to the above list.
I suspect that you still want the WebUI sometimes? So I'd keep , it uses AnyEvent and does only work if someone has the web interface open and is logged in.
If you do not care about statistic from host and guests,, storages and do not need CT scheduling then pvestatd plus rrdcached is good for disabling too...
If you do not want a WebUI or API then you may just disable and stop it too.
If you do not use ZFS the zed service can be kicked out too...

And to my unsolved question, which of these are the cpu hogs.

IMHO, the ain't real cpu hogs – while not all of them are optimized either and are in perl, sure.
But, I really guess that you suffer from latency spikes not missing cpu time, which are two different things.

As a programmer I think some optimization could be in place here.

As we're open source you can always not only do this yourself but also upstream it and thus make the project better for everyone, if you have great ideas here :)

As you got a new kernel (4.10 vs 4.4) with quite a few changes in them it also could have changed scheduling against your specifics usecases favor.

That all said, where does the streaming process run, in a VM or CT?
 
Hi Thomas.

Moved my streaming software (LogitechMediaServer) from proxmox to run locally.
Only using the SMB on proxmox now.
Kicked out the ZFS service.

This is very acceptable now. Proxmox is at great product, just wandering what the upgrade
causes these radical changes to my structure.

/Thanks Michael.
 
Hi Michael,

Moved my streaming software (LogitechMediaServer) from proxmox to run locally.
Only using the SMB on proxmox now.
Kicked out the ZFS service.

OK, pvesr.timer / service could be disabled too, in your setup. It sounds the most reasonable culprit here, its new and starts "often" perfect to cause new small latency spikes :)

This is very acceptable now. Proxmox is at great product, just wandering what the upgrade
causes these radical changes to my structure.

Glad you could work around it. If there's a reproducible regression we will naturally try to investigate, but on a major upgrade a lot of the underlying software stack changes, so some new (problematic) corner cases can be introduced which often are hard to reproduce or so special that nobody finds them straight away.