This didnt help.
What I have done:
1. Fixed a line in code (removed pvescheduller restart), thanks to advice from previous page
2. Fix corosync conf desynchronization (by coping new version of corosync.conf to buggy node)
3. Reconfigure by dpkg
4. Restart daemons or reboot
Well I figured it out: somehow corosync.conf on this node became different (versions differ) with compare to other nodes (100% sure cluster was healthy and this node had been a member when I started an upgrade). After coping corosync.conf file from another node and restarting pvedaemon -...
Upgrade process stacks on setting up pve-manage 8.1.3 if I breaks it (CTRL+C) I see
root@pve2:~# pvecm nodes
Cannot initialize CMAP service
root@pve2:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded...
Im facing the same issue on one of my nodes.
ast login: Wed Dec 6 18:21:35 2023
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock
Specified filename /var/run/pvescheduler.pid.lock does not exist.
root@pve2:~# /usr/bin/pvescheduler stop
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock...
I can confirm that setting
solves ICMP echo reply time increase and RDP freezes almost immediately even with KSM enabled and 6.2 kernel (mitigations still off)
root@046-pve-04315:~# uname -a
Linux 046-pve-04315 6.2.16-15-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-15 (2023-09-28T13:53Z) x86_64...
from my perspective it’s much more correct to test it on the devs side (if Im not mistaken - they manage to reproduce the issue) rather than update my clients’ server to the test kernel and be punished afterwards
@fweber any updates?
p.s. just want to clarify: these ICMP ping response time values are not just values in the shell.
They are RDP session freezes. Yes, very short but quite annoying. It's enough for the end user to pluck the mouse cursor, miss the button (for example) and start complain when...
@fweber @fiona
To get the data you requested i had to negotiate with my users to suffer the freezes and downgrading performance.
So any feedbacks would be very appreciated!
Interesting thing is that the problem occurs more often when >24 vCPU and more than 96Gb of RAM added to VM
VMs with small amount of vCPUs and memory (like 4 vCPUs + 16Gb vRAM) are not affected
Our largest cluster is built on PVE 7.x with 5.15 kernel. It works smoothly and well. I just updated one of its nodes to PVE8 and kernel 6.2 just to check - got the same issue with KSM and CPU spikes (which slightly freezed VM - I can see it as ICMP echo reply time increase). All my debug data...
Met on different clusters built on different server vendors: HP gen8/9, Dell R730xd, Supermicro
Common is only one thing: intel-microcodes installed on every host)
12 hours latter
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=672мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124
Ответ от 172.16.9.242: число байт=32 время=1мс TTL=124...
Take a look at this thread:
https://forum.proxmox.com/threads/proxmox-8-0-kernel-6-2-x-100-cpu-issue-with-windows-server-2019-vms.130727/page-6
There are 2 posts from the dev team what to do to help them in investigating the problem
Here we go
mitigations=off, KSM enabled but not active so far
root@pve-node-03486:~# ./strace.sh 901
strace: Process 5349 attached
strace: Process 5349 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.81...
Yes, this is correct output of the command you provided earlier
root@pve-node-03486:~# cat ./ctrace.sh
#/bin/bash
VMID=$1
PID=$(cat /var/run/qemu-server/$VMID.pid)
timeout 5 strace -c -p $PID
grep '' /sys/fs/cgroup/qemu.slice/$VMID.scope/*.pressure
for _ in {1..5}; do
grep ''...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.