Read this thread carefully.Hi everyone, I bought your same box a few days ago and I have the same freezes. Can you help me? My current kernel is 5.15.74-1-pve
I have a topton with 5105 celeron. I have just one VM with pfSense and it crashed 2 times in 2 days with kernel panic message. I just upgraded Proxmox with pve-kernel-6.1, is it right? Do i need to upgrade bios too? To be precise pfSense crashes and restarts itself while Proxmox seems to have no problems.Read this thread carefully.
Which box do you have and which CPU?
What VMs freeze?
If you need i can attach also logs.spin lock 0xffffffff836e0f00 (smp rendezvous) held by 0xfffff80006c77740 (tid 100645) too long
panic: spin lock held too long
cpuid = 1
time = 1673039279
KDB: enter: panic
Linux pxcw 6.1.0-1-pve
root@pxcw:~# cat /proc/cpuinfo | grep micro
microcode : 0x1d
microcode : 0x1d
microcode : 0x1d
microcode : 0x1d
Things seem to be running well since upgrading to the 6.1 kernel.
Have a CWWK N5105 running Proxmox (6.1 kernel) and an OPNsense VM. Currently at over 2 weeks uptime. No microcode loaded and BIOS settings are basically default (didn't change C states or anything like that). OPNsense is using Linux bridges, no passthrough. PowerD disabled in OPNsense.
Out of the 4 NICs, I use them for:
1) WAN - To Cable Modem
2) Proxmox Management
3&4) LAGG to my main switch - LACP Layer 2+3
Really just bringing this up since my current setup really doesnt have any tweaks or changes other than the 6.1 kernel. Wonder if some of those having major issues should try a more vanilla setup? Guess the hard part here is that the different mini-PCs have different BIOS's it seems which may cause some of the issues.
EDIT: I should mention that the "downtime" two weeks ago was really just running a few updates and not a crash.
Linux pxcw 6.1.0-1-pve
Code:root@pxcw:~# cat /proc/cpuinfo | grep micro microcode : 0x1d microcode : 0x1d microcode : 0x1d microcode : 0x1d
View attachment 45348
apt install lm-sensors
watch sensors
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
crontab -e
@reboot echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
watch "lscpu | grep MHz"
Things seem to be running well since upgrading to the 6.1 kernel.
Have a CWWK N5105 running Proxmox (6.1 kernel) and an OPNsense VM. Currently at over 2 weeks uptime. No microcode loaded and BIOS settings are basically default (didn't change C states or anything like that). OPNsense is using Linux bridges, no passthrough. PowerD disabled in OPNsense.
Out of the 4 NICs, I use them for:
1) WAN - To Cable Modem
2) Proxmox Management
3&4) LAGG to my main switch - LACP Layer 2+3
Really just bringing this up since my current setup really doesnt have any tweaks or changes other than the 6.1 kernel. Wonder if some of those having major issues should try a more vanilla setup? Guess the hard part here is that the different mini-PCs have different BIOS's it seems which may cause some of the issues.
EDIT: I should mention that the "downtime" two weeks ago was really just running a few updates and not a crash.
Linux pxcw 6.1.0-1-pve
Code:root@pxcw:~# cat /proc/cpuinfo | grep micro microcode : 0x1d microcode : 0x1d microcode : 0x1d microcode : 0x1d
View attachment 45348
- Are C-States and Enhanced C-States enabled in the BIOS?
- Is ASPM set to Auto for all PCIe ports in the BIOS? Dont know
- What CPU governor are you using in proxmox?
- What do your thermals look like in proxmox? Good - see below
Install thermal sensor package on proxmox:
apt install lm-sensors
run it:
watch sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +33.0°C (high = +105.0°C, crit = +105.0°C)
Core 0: +33.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +33.0°C (high = +105.0°C, crit = +105.0°C)
Core 2: +33.0°C (high = +105.0°C, crit = +105.0°C)
Core 3: +33.0°C (high = +105.0°C, crit = +105.0°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +40.0°C (crit = +119.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +32.9°C (low = -0.1°C, high = +69.8°C)
(crit = +84.8°C)
ERROR: Can't get value of subfeature temp2_min: I/O error
ERROR: Can't get value of subfeature temp2_max: I/O error
Sensor 1: +43.9°C (low = +0.0°C, high = +0.0°C)
Check CPU governor:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
Set CPU governor until next reboot to powersave:
echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Set it back to performance:
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Set it automatically at reboot:
crontab -e @reboot echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
watch CPU frequency, it should go down to 800Mhz or so if in powersave:
watch "lscpu | grep MHz"
You have two Ubuntu VMs of the same version except one is stable and one is not? Do they by any chance have different power management settings? Are they running with the same virtual CPU and flags? How are they different? Is the one that's stable always under constant load?
I have a suspicion that the VM guests attempt to idle the CPU in a way that it doesn't support when virtualized. The two backtraces from my pfSense both mention idling the CPU.
FromThis would be great news, having the same processor so far have been able to mitigate it but randomly - like every 3 weeks - still a vm freezes.
How do you install the kernel 6.1? I was on edge (fabian) kernel, but for now 6.1 is not available yet.
Thanks
Thanks! I totally missed that postFrom
https://forum.proxmox.com/threads/opt-in-linux-6-1-kernel-for-proxmox-ve-7-x-available.119483/
How to install:
- apt update
- apt install pve-kernel-6.1
- reboot
Interesting the VM which is crashing does have a higher load then the one that does not crash and is less loaded with tasks.
I have the CPU set to "host" don't know how to check the flags?
A few answers below. Dont have access to the BIOS at the moment.
Dont know - dont expect to touch the BIOS anytime soon, but Ill look next time
Code:coretemp-isa-0000 Adapter: ISA adapter Package id 0: +33.0°C (high = +105.0°C, crit = +105.0°C) Core 0: +33.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +33.0°C (high = +105.0°C, crit = +105.0°C) Core 2: +33.0°C (high = +105.0°C, crit = +105.0°C) Core 3: +33.0°C (high = +105.0°C, crit = +105.0°C) acpitz-acpi-0 Adapter: ACPI interface temp1: +40.0°C (crit = +119.0°C) nvme-pci-0100 Adapter: PCI adapter Composite: +32.9°C (low = -0.1°C, high = +69.8°C) (crit = +84.8°C) ERROR: Can't get value of subfeature temp2_min: I/O error ERROR: Can't get value of subfeature temp2_max: I/O error Sensor 1: +43.9°C (low = +0.0°C, high = +0.0°C)
Code:cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor performance performance performance performance
Did that crash occur only once? Or did your time between failure go back to 8-24 hours after the first crash?Same issue here, had VMs crashing almost daily on a fresh Proxmox installation and they had been running for almost a month after updating to kernel 5.19 and microcode.
This night a VM crashed again (always the same one, Debian 11 with Docker)... I'll be trying the C-State fix.
Intel NUC with N5105 CPU.
It doesn't in my case, posted my experiences a couple of days ago in this thread. And there are others - it seems down to the processor (at least N5105/N6005) and a combination with the 5.1x kernel and c-states. I ran into issues running both bare metal (couple of times a week) and later same disk in vm under proxmox (daily).bare metal seems to work flawless doesn't it?
We use essential cookies to make this site work, and optional cookies to enhance your experience.