VM freezes irregularly

interesting. then maybe the 2% im seeing right now as the base floor in the graph are similar. i wonder if there is a bios-setting that can influence this.

so far from reading the forums i was under the impression that io dely is related to activity and individual drive performance where consumer drives (which i use as well) comparatively suck compared to real enterprise drives.
but that is under load.
but this is getting offtopic. we better discuss this in a seperate thread :)
 
Last edited:
Hi, just wanted to add my experience.

N5105 based system, OVMS/Home Assistant install crashing every 6-8 hours. SeaBIOS/Pfsense install not crashing.

Update to the 6.2 Kernel and Microcode 0x24000023 has me at 3 days so far. No other changes.
 
Hi, just wanted to add my experience.

N5105 based system, OVMS/Home Assistant install crashing every 6-8 hours. SeaBIOS/Pfsense install not crashing.

Update to the 6.2 Kernel and Microcode 0x24000023 has me at 3 days so far. No other changes.
the 23 microcode will most likely not solve your issues permanently.
at least that seems to be the consens in thsi thread.
the 24 microcode on the other seems to be the fix we wanted for these cpu's. at least some of us are already on several weeks uptime on their vm's with it :)
 
Does version 23 of the microcode run stable with CPU governor powersave?
ahh, now i understand.
for me the version 23 microcode didnt help much, no matter if i used powersave or performance.
the longest i could go without any vm dying was about 10 days.
but on average every 3-5 days something died on my n5105 box.
since installing the version 24 microcode i havent had any vms die on me except opnsense, which didnt die, but has some weird issue with eating all resources if i schedule an interface reset in it via cron. unconfiguring the scheduled interface reset has fixed that.
but this is something opnsense specific and has nothing to do with proxmox.
 
  • Like
Reactions: MrHello
It is great to see that "my" thread seemed to have helped some people.

I had 70 days of uptime before a power outage and now I am back to 3-15 days before a reboot of pfSense.

I am currently on
- Proxmox 7.3-6
- Kernel: Linux pve 5.15.85-1-pve #1 SMP PVE 5.15.85-1 (2023-02-01T00:00Z) x86_64 GNU/Linux
- no idea on the microcode

How do I check/update the microcode and update the kernel?
 
It is great to see that "my" thread seemed to have helped some people.

I had 70 days of uptime before a power outage and now I am back to 3-15 days before a reboot of pfSense.

I am currently on
- Proxmox 7.3-6
- Kernel: Linux pve 5.15.85-1-pve #1 SMP PVE 5.15.85-1 (2023-02-01T00:00Z) x86_64 GNU/Linux
- no idea on the microcode

How do I check/update the microcode and update the kernel?
update of the kernel depends on hwre you want to go.
the 5.15 kernel is automatically updated as far as i know.
if you want to use 6.1 or 6.2 kernels you will need to
Code:
apt install pve-kernel-6.1
or
Code:
apt install pve-kernel-6.2

as for checking which microcode you have you can use
Code:
dmesg | grep "microcode updated early to"
if it returns something its the version of the installed microcode.
if it doesnt return anything its most likely that you havent installed the microcode.

in that case you can
Code:
apt install intel-microcode
, which will get you to the version 23 microcode (you may need to enable non-free repositories for that).
after that you will need to follow the instructions earlier in this thread on how to overwrite the 23 microcode with the 24 microcode.

is used the instructions in this post: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/post-536880
 
Last edited:
update of the kernel depends on hwre you want to go.
the 5.15 kernel is automatically updated as far as i know.
if you want to use 6.1 or 6.2 kernels you will need to
Code:
apt install pve-kernel-6.1
or
Code:
apt install pve-kernel-6.2

as for checking which microcode you have you can use
Code:
dmesg | grep "microcode updated early to"
if it returns something its the version of the installed microcode.
if it doesnt return anything its most likely that you havent installed the microcode.

in that case you can
Code:
apt install intel-microcode
, which will get you to the version 23 microcode (you may need to enable non-free repositories for that).
after that you will need to follow the instructions earlier in this thread on how to overwrite the 23 microcode with the 24 microcode.

is used the instructions in this post: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/post-536880

Is there any disadvantage of going to 6.2 kernel vs 6.1?

As for microcode, the command did not return anything:

Code:
root@pve:~# cat /proc/cpuinfo|grep 'microcode\|model name'
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d

So I will need to install intel-microcode and then overwrite it? Can I break anything with it?
 
Last edited:
Is there any disadvantage of going to 6.2 kernel vs 6.1?

As for microcode, the command did not return anything:

Code:
root@pve:~# cat /proc/cpuinfo|grep 'microcode\|model name'
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
microcode       : 0x1d
i can only speak from my point of view.
i was running 6.1.2 on my n5105 box since it came out and then immediately switched to 6.2.2 (havent rebooted since for even newer kernel) and havent noticed any negative effects, but i am not using features such as gpu passthrough or anything, so others may think differently.
 
i can only speak from my point of view.
i was running 6.1.2 on my n5105 box since it came out and then immediately switched to 6.2.2 (havent rebooted since for even newer kernel) and havent noticed any negative effects, but i am not using features such as gpu passthrough or anything, so others may think differently.
Sounds good,
I assume I will need a reboot for the new microcode?
 
Hi there,

Small contribution to this thread.

Running Proxmox on a Odroid H3 (Intel N5105).

Code:
4 x Intel(R) Celeron(R) N5105 @ 2.00GHz (1 Socket)
Linux 6.2.6-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.6-1 (2023-03-14T17:08Z)
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on pcie_aspm.policy=performance split_lock_detect=off"
32GB RAM
NVMe system disk + 2 SATA SSD Crucial M500

I'm running a Windows 10 guest vm that used to reset itself at least twice a day under vCPU pressure condition. Promox itself was stable.
Odroid H3 came with microcode 23.

Since I upgraded microcode to 24, no more issue.
This microcode does fix something ; perharps a positive side-effect of a security fix that also fixes a race condition...
Code:
model name      : Intel(R) Celeron(R) N5105 @ 2.00GHz
stepping        : 0
microcode       : 0x24000024


Would also share something surprising :

On H3 (Intel N5105) :
Code:
root@pve1:~# cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 4
Available idle states: POLL C1_ACPI C2_ACPI C3_ACPI
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 1798437
Duration: 183454194
C1_ACPI:
Flags/Description: ACPI FFH MWAIT 0x0
Latency: 1
Usage: 254458076
Duration: 70202037938
C2_ACPI:
Flags/Description: ACPI FFH MWAIT 0x31
Latency: 253
Usage: 94577494
Duration: 115673679409
C3_ACPI:
Flags/Description: ACPI FFH MWAIT 0x60
Latency: 1048
Usage: 21449179
Duration: 51663352077


On H2+ (J4115)
Code:
root@pve2:~# cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 8
Available idle states: POLL C1 C1E C6 C7s C8 C9 C10
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 3127267
Duration: 60532016
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 71236569
Duration: 3713080218
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 95569525
Duration: 19317061195
C6:
Flags/Description: MWAIT 0x20
Latency: 150
Usage: 0
Duration: 0
C7s:
Flags/Description: MWAIT 0x31
Latency: 150
Usage: 342048643
Duration: 333922774597
C8:
Flags/Description: MWAIT 0x40
Latency: 5963
Usage: 2425164
Duration: 7676647957
C9:
Flags/Description: MWAIT 0x50
Latency: 5963
Usage: 3335931
Duration: 13552333719
C10:
Flags/Description: MWAIT 0x60
Latency: 6291
Usage: 16030930
Duration: 68046199898

One can see C2 and C3 states of the N5105 show very high latencies...
I suspect some cosmetic bug as such high latencies for low C states are not realistic for moderm CPUs.

When trying to improve system responsiveness, it is common pratice to disable high latency states.
Typical command :
Code:
cpupower idle-set -D 11

Which disable C-states 3 through 7 on J4115
Obvouisly, this would disable all C-states except C1 on N5105.
 
Last edited:
How did y'all handle BIOS settings after updating the kernel and the microcode. Some settings were recommended earlier in the thread, but not sure they did the trick. I e.g. disabled the power saving modes (C-State?). Would you do a BIOS settings reset?
 
How did y'all handle BIOS settings after updating the kernel and the microcode. Some settings were recommended earlier in the thread, but not sure they did the trick. I e.g. disabled the power saving modes (C-State?). Would you do a BIOS settings reset?
i turned everything back on after the 24 microcode. seems to work just fine with all c-states enabled.
 
  • Like
Reactions: thimplicity
How did y'all handle BIOS settings after updating the kernel and the microcode. Some settings were recommended earlier in the thread, but not sure they did the trick. I e.g. disabled the power saving modes (C-State?). Would you do a BIOS settings reset?
Hello,

After upgrading the microcode, I did not make any change to the BIOS settings.
As far as I remember, it is running the BIOS defaults.
As you can see in my previous post, I have C-states enabled.
It is common pratice to try disabling some C-state in case of stability problems ; especially disabling states C6 and higher.
However, in this particular case, I do think it is not related.

I will check my BIOS settings and will let you know.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!