Hi everyone,
Host CPU usage gradually increases over uptime (≈18% → ≈30%), and overall host memory usage increases as well (≈14.5 GB → ≈18 GB), while all VMs and LXCs remain stable.
Only a full host reboot resets the behavior. VM/LXC restarts do not.
I’ve already investigated a number of common causes, so I’ll keep this as factual and concise as possible.
Host CPU: ~18%
Host RAM: ~14.5 GB
After ~24 hours of uptime:
Host CPU: ~30%
Host RAM: ~18 GB
Important: the workload inside the VMs does not change. There is no observable increase in:
I’m looking for insights from people who have seen similar host-side CPU and memory drift where:
Thanks in advance.
PS.
The situation a few hours after a reboot of the host:


After a few days:


A reboot after a few days:


Host CPU usage gradually increases over uptime (≈18% → ≈30%), and overall host memory usage increases as well (≈14.5 GB → ≈18 GB), while all VMs and LXCs remain stable.
Only a full host reboot resets the behavior. VM/LXC restarts do not.
I’ve already investigated a number of common causes, so I’ll keep this as factual and concise as possible.
Environment
- Proxmox VE 8.4.19 x86_64
- Kernel: 6.8.12-25-pve
- CPU: Intel i7-3770 (8 threads) @ 3.90 GHz
- RAM: 32 GB
- Storage: ZFS (ARC remains minimal and does not grow with uptime)
- Workload: multiple dedicated game servers (UT2004, UT3, UT4, COD4, BF2142, Xonotic, rFactor, etc.)
- Total VM usage: ~14.5 GB RAM, ~18% CPU
The actual problem
Immediately after a host reboot:Host CPU: ~18%
Host RAM: ~14.5 GB
After ~24 hours of uptime:
Host CPU: ~30%
Host RAM: ~18 GB
Important: the workload inside the VMs does not change. There is no observable increase in:
- guest CPU
- guest RAM
- guest load
- QEMU RSS
- interrupts
- IO
- slab usage
- SUnreclaim
- softirq load
- network load
- disk load
What has been observed across multiple uptime cycles
Guest and VM-related metrics remain stable while overall host CPU and memory usage gradually increase over time, including:- guest CPU / RAM / load remain stable
- QEMU process RSS remains stable
- no sustained change in IO or IO wait
- no visible change in interrupt rates
- no slab or SUnreclaim growth trend
- no obvious scheduler / NUMA imbalance changes
- no ballooning or memory pressure signals
What I’m looking for
I’m not looking for generic troubleshooting steps such as:- “run top/htop again”
- “check interrupts”
- “check IO”
- “maybe it’s a runaway process”
- “maybe it’s a leak”
I’m looking for insights from people who have seen similar host-side CPU and memory drift where:
- VMs and LXCs remain stable
- no metrics visibly escalate
- no runaway process is visible
- only host CPU and memory usage rise over uptime
- only a host reboot resets the behavior
- modern kernel (6.8.x)
- timer/tick-heavy workloads
- long uptime
Specific questions
- Are there known regressions in 6.8.x related to:
- scheduler
- cpuidle / pstate
- KVM halt polling
- NO_HZ / tick handling
- virtio
- io_uring
- Would it make sense to test:
- an older PVE kernel branch (6.5 / 6.2)
- a newer kernel (if available)
- forcing CPU governor to performance
- disabling io_uring for VM disks
- Are there known cases where:
- host CPU and overall memory usage rise over uptime
- guest-side metrics remain stable
- only a host reboot resolves the behavior
Thanks in advance.
PS.
The situation a few hours after a reboot of the host:


After a few days:


A reboot after a few days:


Last edited: