preempt-rt oder lowlatency kernel

mhe

Active Member
Apr 30, 2016
4
0
41
62
Kann auf den Proxmox VE Host ein realtime-Kernel installiert werden und wie wäre die richtige Vorgehensweise?
Leider kann ich dazu keine Informationen finden. Hätte aber die Hoffnung, dass zumindest ein Ubuntu "lowlatency" funktionieren könnte.
ungefähr so:
sudo apt-get -s install linux-lowlatency

Bieten die Proxmox-Management-Werkzeuge (cli, web-interface) auch parameter für die lowlatency-Optimierung?
 
nein - macht meiner meinung nach auch nicht viel sinn auf einem hypervisor node.. gibts einen konkreten use case?
 
Hallo Fabian,
konkrete UseCases gibt es. Die Leistung im Sinne von Daten-Durchsatz und Rechenleistung rückt zunehmend in den Hintergrund. Der User nimmt Verzögerungen sehr sensibel wahr und empfindet das als störend. Und für zeitkritische Algorithmen (Audio, Fax, Realtime...) sowieso ein muss.
Deshalb ist die Zeit gekommen für Realtime-Hypervisors und Co. (u. a. RT-KVM).

Beispiel UseCases:
Fax Server T.30/T.38
Media-Gateways
Firewalls und NFV (jede ms Latenz/Jitter ist unerwünscht)
Soft- und Hard-Realtime Steuerungen (PLC, DC ... deterministische Zykluszeiten 100ms...1ms)
Industrie 4.0 - convergenz von OT und IT ... Soft-SPS/PLCs auf KVM-Host konsolidieren
Game-Server
Audio/Video-Processing
Realtime-Messaging
u. ä.

Hier ein paar Details zu den Entwicklungen bei OpenStack im Umfeld von Telco, Industrial-IoT und NFV:

https://review.openstack.org/#/c/139688
https://review.openstack.org/cat/139688,11,specs/mitaka/approved/libvirt-real-time.rst^0
===========================
Libvirt real time instances
===========================

https://blueprints.launchpad.net/nova/+spec/libvirt-real-time

The CPU pinning feature added to the ability to assign guest virtual CPUs
to dedicated host CPUs, providing guarantees for CPU time and improved worst
case latency for CPU scheduling. The real time feature builds on that work
to provide stronger guarantees for worst case scheduler latency for vCPUs.

Problem description
===================

The CPU pinning feature allowed guest vCPUs to be given dedicated access to
individual host pCPUs. This means virtual instances will no longer suffer
from "steal time" where their vCPU is pre-empted in order to run a vCPU
belonging to another guest. Removing overcommit eliminates the high level
cause of guest vCPU starvation, but guest vCPUs are still susceptible to
latency spikes from various areas in the kernel.

For example, there are various kernel tasks that run on host CPUs, such as
interrupt processing that can preempt guest vCPUs. QEMU itself has a number
of sources of latency, due to its big global mutex. Various device models
have sub-optimal characteristics that will cause latency spikes in QEMU,
as may underling host hardware. Avoiding these problems requires that the
host kernel and operating system be configured in a particular manner...

https://software.intel.com/en-us/articles/nfv-performance-optimization-for-vcpe
The baseline setup uses an Intel® Xeon® processor E5-2680 (code-named Sandybridge) and does not include any BIOS optimizations. In contrast, the high-performance setup uses an Intel® Xeon® processor E5-2697 v3 (code-named Haswell) and includes certain BIOS tuning such as “maximize performance versus power,” and disablement of C-states and P-states. The baseline uses the standard kernel data path whereas the high-performance setup uses the OVS DPDK data path. Although both testbeds use Fedora* 21 as the base OS, the baseline uses a standard non-real-time kernel (3.18), whereas the high-performance setup uses Linux Real-Time Kernel (3.14) with a tuned configuration (isolation of vSwitch and VM cores from the host OS, disabling Security Enhanced Linux, using idle polling and also selecting the perfect Time-Stamp Counter clock). The baseline setup uses “vanilla” OpenStack settings to spin up the VM and assign network resources. In contrast, the high-performance setup is more finely tuned to allow dedicated CPUs to be pinned for the vSwitch and VNFs respectively. The high-performance setup also ensures that the CPUs and memory from the same socket are used for the VNFs, and the specific socket in use is that which connects directly to the physical NIC interfaces of the server...
 
fast vergessen ... bei RedHat dürften UseCases aus der Finanz-Industrie (Hochfrequenz-Handel) eine treibende Kraft sein. Vermutlich wird deshalb bei RedHat 7 auch Realtime-Tuning behandelt.

Aktuelle Implementierung KVM / OpenStack:

... ein Auszug aus dem aktuellen OpenStack Admin-Guide:

http://docs.openstack.org/admin-guide/compute-cpu-topologies.html
Customizing instance CPU pinning policies
Important
The functionality described below is currently only supported by the libvirt/KVM driver.

By default, instance vCPU processes are not assigned to any particular host CPU, instead, they float across host CPUs like any other process. This allows for features like overcommitting of CPUs. In heavily contended systems, this provides optimal system performance at the expense of performance and latency for individual instances.

Some workloads require real-time or near real-time behavior, which is not possible with the latency introduced by the default CPU policy. For such workloads, it is beneficial to control which host CPUs are bound to an instance’s vCPUs. This process is known as pinning. No instance with pinned CPUs can use the CPUs of another pinned instance, thus preventing resource contention between instances. To configure a flavor to use pinned vCPUs, a use a dedicated CPU policy. To force this, run:

$ openstack flavor set m1.large --property hw:cpu_policy=dedicated

Caution

Host aggregates should be used to separate pinned instances from unpinned instances as the latter will not respect the resourcing requirements of the former.

When running workloads on SMT hosts, it is important to be aware of the impact that thread siblings can have. Thread siblings share a number of components and contention on these components can impact performance. To configure how to use threads, a CPU thread policy should be specified. For workloads where sharing benefits performance, use thread siblings. To force this, run:

$ openstack flavor set m1.large \
--property hw:cpu_policy=dedicated \
--property hw:cpu_thread_policy=require

For other workloads where performance is impacted by contention for resources, use non-thread siblings or non-SMT hosts. To force this, run:

$ openstack flavor set m1.large \
--property hw:cpu_policy=dedicated \
--property hw:cpu_thread_policy=isolate

Finally, for workloads where performance is minimally impacted, use thread siblings if available. This is the default, but it can be set explicitly:

$ openstack flavor set m1.large \
--property hw:cpu_policy=dedicated \
--property hw:cpu_thread_policy=prefer

For more information about the syntax for hw:cpu_policy and hw:cpu_thread_policy, refer to the Flavors guide.

Applications are frequently packaged as images. For applications that require real-time or near real-time behavior, configure image metadata to ensure created instances are always pinned regardless of flavor. To configure an image to use pinned vCPUs and avoid thread siblings, run:

$ openstack image set [IMAGE_ID] \
--property hw_cpu_policy=dedicated \
--property hw_cpu_thread_policy=isolate

Image metadata takes precedence over flavor extra specs. Thus, configuring competing policies causes an exception. By setting a shared policy through image metadata, administrators can prevent users configuring CPU policies in flavors and impacting resource utilization. To configure this policy, run:...
 
I would like to bump this thread, because even if there seems to be some users compiling their own custom proxmox kernels, I really would like to keep up with pve-dev kernel updates BUT realtime is a must in my setup since I have a firewire pci-e 2x card that can't be passed to my GPU PassThrough KVM's (because it just breaks my software raid!) so my workaround is to use jacknet (jackd -dfirewire) then to get the firewire sound card passed to my KVM, it works like a charm but there are very small audible glitches in sound when using software like cubase.

Even if kernel is built with realtime support, it takes a bit more config to actually "enable" and use it so I don't think it would be a problem to add realtime support to proxmox kernel, please take in consideration that proxmox is useful for so many cases that it's just natural we find the need for more in-kernel support depending on our use cases.

thank you all for the good work, I don't regret using proxmox at all and I have it running in many production setups for my clients.

Maybe there could be a pve-kernel-xx-rt deb package just like in debian?
 
As far as I can see it takes two little changes in the Makefile of the pve-kernel.
Changing the suffix to something like "-lowlatency-pve" and piping the lowlatency config instead of the generic one.
Additionally allow the suffix "-lowlatency-pve" in "debian/scripts/find-firmware.pl"
I am trying that on the current 5.0 right now.

[edit]

working so far

1574875527252.png
 
Last edited:
  • Like
Reactions: mhe
As far as I can see it takes two little changes in the Makefile of the pve-kernel.
Changing the suffix to something like "-lowlatency-pve" and piping the lowlatency config instead of the generic one.
Additionally allow the suffix "-lowlatency-pve" in "debian/scripts/find-firmware.pl"
I am trying that on the current 5.0 right now.

[edit]

working so far

View attachment 13075

even its a older question, can you please describe how you did it? i also plan the next kernel update with prempt
 
What's the benefit of running a RT kernel, and why isn't there an official build for that type of kernel ?
It'd be really convenient to test this by installing it via apt.
 
Man könnte sagen höhere "Prozess-Qualität" auf Kosten von "Prozess-Prei-/Leistung".

Aus Sicht der Anwender könnte das z. B. folgendes Bedeuten:

eine "Smartphone App" reagiert bei max. möglichen Benutzern mit ...
- Standard-Kernel 1100 User aktiv ... bei 95% der Zugriffe in weniger als 0,2s / bei 5% aber 1 bis 2s beantwortet
- Realtime-Kernel 1000 User aktiv ... bei 95% der Zugriffe in weniger als 0,3s / bei 5% aber nur 0,5 bis 0,8s
 
It interests me more in the context of High-Performance VM, example - my "daily driver" is a Windows 10 VM with PCIe GPU passed through, 32GB ram allocated, the vm disk is on NVME drive. Sometimes it feels sluggish, so I wonder if RT kernel would make any difference ( I also plan to try CPU isolation and pinning: https://forum.proxmox.com/threads/cpu-pinning.67805/ ), and 1GB hugepages as well.

I'm looking for the best "close to bare metal" performance as possible. The other running VMs are for dev/web servers & stuff like that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!