VM freezes irregularly

Can you try running this and rebooting? You may have multiple kernels installed and not booting the latest one...
Code:
update-initramfs -u -k all
Hello,

I've come to test the order you recommend. Alas, it does not change anything :-/.

Here are the logs and result:
Code:
root@proxmoxsrv:~# ls -al /boot/
total 233536
drwxr-xr-x  5 root root     4096 Mar 11 16:03 .
drwxr-xr-x 18 root root     4096 Sep  7  2022 ..
-rw-r--r--  1 root root   259819 Apr 22  2022 config-5.15.30-2-pve
-rw-r--r--  1 root root   261203 Dec 14 14:09 config-5.15.83-1-pve
-rw-r--r--  1 root root   261203 Feb  1 16:07 config-5.15.85-1-pve
drwxr-xr-x  3 root root     4096 Jan  1  1970 efi
drwxr-xr-x  6 root root     4096 Mar 11 15:38 grub
-rw-r--r--  1 root root 63156164 Sep  7  2022 initrd.img-5.15.30-2-pve
-rw-r--r--  1 root root 61534898 Jan 23 21:49 initrd.img-5.15.83-1-pve
-rw-r--r--  1 root root 61533176 Mar 11 16:03 initrd.img-5.15.85-1-pve
-rw-r--r--  1 root root   182704 Aug 15  2019 memtest86+.bin
-rw-r--r--  1 root root   184884 Aug 15  2019 memtest86+_multiboot.bin
drwxr-xr-x  2 root root     4096 Feb 12 15:31 pve
-rw-r--r--  1 root root  6073669 Apr 22  2022 System.map-5.15.30-2-pve
-rw-r--r--  1 root root  6085308 Dec 14 14:09 System.map-5.15.83-1-pve
-rw-r--r--  1 root root  6085308 Feb  1 16:07 System.map-5.15.85-1-pve
-rw-r--r--  1 root root 10846272 Apr 22  2022 vmlinuz-5.15.30-2-pve
-rw-r--r--  1 root root 11314272 Dec 14 14:09 vmlinuz-5.15.83-1-pve
-rw-r--r--  1 root root 11315200 Feb  1 16:07 vmlinuz-5.15.85-1-pve


root@proxmoxsrv:~# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-5.15.85-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.15.83-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.15.30-2-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.

drwxr-xr-x 18 root root     4096 Sep  7  2022 ..
-rw-r--r--  1 root root   259819 Apr 22  2022 config-5.15.30-2-pve
-rw-r--r--  1 root root   261203 Dec 14 14:09 config-5.15.83-1-pve
-rw-r--r--  1 root root   261203 Feb  1 16:07 config-5.15.85-1-pve
drwxr-xr-x  3 root root     4096 Jan  1  1970 efi
drwxr-xr-x  6 root root     4096 Mar 11 15:38 grub
-rw-r--r--  1 root root 60295563 Mar 12 10:41 initrd.img-5.15.30-2-pve
-rw-r--r--  1 root root 61533465 Mar 12 10:40 initrd.img-5.15.83-1-pve
-rw-r--r--  1 root root 61528292 Mar 12 10:40 initrd.img-5.15.85-1-pve
-rw-r--r--  1 root root   182704 Aug 15  2019 memtest86+.bin
-rw-r--r--  1 root root   184884 Aug 15  2019 memtest86+_multiboot.bin
drwxr-xr-x  2 root root     4096 Feb 12 15:31 pve
-rw-r--r--  1 root root  6073669 Apr 22  2022 System.map-5.15.30-2-pve
-rw-r--r--  1 root root  6085308 Dec 14 14:09 System.map-5.15.83-1-pve
-rw-r--r--  1 root root  6085308 Feb  1 16:07 System.map-5.15.85-1-pve
-rw-r--r--  1 root root 10846272 Apr 22  2022 vmlinuz-5.15.30-2-pve
-rw-r--r--  1 root root 11314272 Dec 14 14:09 vmlinuz-5.15.83-1-pve
-rw-r--r--  1 root root 11315200 Feb  1 16:07 vmlinuz-5.15.85-1-pve


root@proxmoxsrv:~# echo 1 > /sys/devices/system/cpu/microcode/reload
root@proxmoxsrv:~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.85-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.

root@proxmoxsrv:~# ls -al /boot/
total 230740
drwxr-xr-x  5 root root     4096 Mar 12 10:42 .
drwxr-xr-x 18 root root     4096 Sep  7  2022 ..
-rw-r--r--  1 root root   259819 Apr 22  2022 config-5.15.30-2-pve
-rw-r--r--  1 root root   261203 Dec 14 14:09 config-5.15.83-1-pve
-rw-r--r--  1 root root   261203 Feb  1 16:07 config-5.15.85-1-pve
drwxr-xr-x  3 root root     4096 Jan  1  1970 efi
drwxr-xr-x  6 root root     4096 Mar 11 15:38 grub
-rw-r--r--  1 root root 60295563 Mar 12 10:41 initrd.img-5.15.30-2-pve
-rw-r--r--  1 root root 61533465 Mar 12 10:40 initrd.img-5.15.83-1-pve
-rw-r--r--  1 root root 61533973 Mar 12 10:42 initrd.img-5.15.85-1-pve
-rw-r--r--  1 root root   182704 Aug 15  2019 memtest86+.bin
-rw-r--r--  1 root root   184884 Aug 15  2019 memtest86+_multiboot.bin
drwxr-xr-x  2 root root     4096 Feb 12 15:31 pve
-rw-r--r--  1 root root  6073669 Apr 22  2022 System.map-5.15.30-2-pve
-rw-r--r--  1 root root  6085308 Dec 14 14:09 System.map-5.15.83-1-pve
-rw-r--r--  1 root root  6085308 Feb  1 16:07 System.map-5.15.85-1-pve
-rw-r--r--  1 root root 10846272 Apr 22  2022 vmlinuz-5.15.30-2-pve
-rw-r--r--  1 root root 11314272 Dec 14 14:09 vmlinuz-5.15.83-1-pve
-rw-r--r--  1 root root 11315200 Feb  1 16:07 vmlinuz-5.15.85-1-pve

root@proxmoxsrv:~# reboot

For information:
Code:
Linux proxmoxsrv 5.15.85-1-pve #1 SMP PVE 5.15.85-1 (2023-02-01T00:00Z) x86_64
 
Last edited:
@AdriftAtlas

Nope, not planning to, doesn't seem to have any benefits especially for GPU passthrough.

10 days uptime, and not a single error, crash, freeze or reboot. Hopefully will post a 20 day.
 

Attachments

  • pve1.jpg
    pve1.jpg
    55 KB · Views: 54
  • PVE.jpg
    PVE.jpg
    26 KB · Views: 50
  • opnsense.jpg
    opnsense.jpg
    45.8 KB · Views: 51
  • Like
Reactions: sdjaime
I've been following this thread and trying the different solutions. I think I am ok now, so I wanted to thank you for all your work!

I followed @LiFE1688 changes, now a few pages back. I just wondered if someone would mind posting the current "best" fix. Is it just to install the new microcode (from @R1CH ), or do you still need to:
Edit grub intel_idle.max_cstate=1 processor.max_cstate=1
change the pve-kernel (which is the current recommended version?)

Thanks
 
for me personally kernel 6.1.2 and the 24 microcode did it. since then that particular machine (Topton n5105 machine) has not been acting up.

but i have decided its time to upgrade to a model with pentium gold 8505 (alder lake u) which doesnt have this issue at all as far as i can tell after 2 weeks of using it.
this is purely for performance reasons and because of the 2 extra lan-ports (6 compared to the 4 on the old unit).
 
@_gabriel
I recommend just updating Intel Microcode for Jasper Lake to 0x24000024, no need to set CSTATE, PSTATE in grub, doesn't matter if u are using the default kernel, but with 6.1, u can do GPU passthrough for containers and VM.

If you add non-free to your debian repo, and do a apt install intel-microcode, it should be getting u to 0x24000023, u will need to copy the microcode from microcode-20230214 Release over the current microcode and do an update-initramfs -u

Instructions are here

I won't be posting a detailed step by step guide till I got mine running without incident for at least 30 days, I am on day 12 currently.

@beisser
doesn't the 8505 needs kernel 6.1 and above to function properly? Just wondering. I am thinking of getting the 8505 6 port from CWWK to test with.
 
i have that same 8505 but from topton (same model, different logo in bios).

https://de.aliexpress.com/item/1005005122939249.html

i havent tried running it on any of the 5.x kernels. i immediately went for 6.1.2 and then later to 6.2.2 when it came out.

i can recommend flashing the unlocked bios from this thread: https://forums.servethehome.com/index.php?threads/cwwk-i5-1235u-6-port-i226-report.39341/

it enables HWP which is disabled by default in those boxes.
HWP is needed for intel.pstate to work.
without intel.pstate the acpi cpu frequency driver is beeing used which scales the cpu between 400 and 2500 mhz (independant of cpu cores) while intel.pstate scales it between 400 and 3300 mhz (e-cores)/4400 mhz (p-cores).
it also unlocks virtually all imaginable bios options which are usually hidden from the user.
most of those options are hard to decipher though, so play with them at your own peril :D
 
Ah... Thanks, I hope I will remember about it when I get the 8505.

These miniPCs are rather interesting, the most headache I have are the N5105 4 port and N6005 6 port because of the VM crash / freeze / reboot.
The N6005 6 port came with all options unlocked BIOS, the N5105 4 ports version 4 one had to be flashed. So pretty much the same with the 8505.

N6005 6 port had CSTATE disabled by default, and had to be enabled. Thanks to @AdriftAtlas for pointing that out.
 
after power loss of my house i have restarted test with kernel 6.1.15 and always microcode 024.
Last test i have reached 10d without iussue, no reboot, freeze, ect. Never reached before 4/6 days. My tests are from 2/3 months,

Now with 6.1.15 i'm @ 3d 23h. I will update my test ... i think/hope that microcode 024 is the solution of problem.
 
Good news. Then should microcode 24 also fix the issue with the stock kernel 5.15, right? Anyone here who could confirm this?
 
Yes, Kernel 5.15 should work fine after the 0x24000024 update.

If you need / want GPU passthrough, then 6.1 and above would be required, but that is a function thing, and not a stability thing.
 
  • Like
Reactions: some1one
Just chiming in here after being a lurker for quite some time.

I own the Topton N6005 unit and am affected as well. VMs freezing irregularly while the host and LCX containers continue running fine. To my observation VMs with attached USB devices are more likely to freeze than VMs without.

For me microcode 0x24000024 seems to have done the trick too. After resetting the BIOS, doing a fresh install of Proxmox and installing this microcode I have had an uptime of 13,5 days until I got stopped by a dying SSD. Before VMs froze anywhere between 3-5 days of uptime. Today I replaced the SSD, rebuilt the RAID 1 and updated the kernel to the 6.2 release.

Still optimistic.

Code:
root@proxmox:~# uname -a
Linux proxmox 6.2.6-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.6-1 (2023-03-14T17:08Z) x86_64 GNU/Linux



root@proxmox:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on



root@proxmox:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
intel_pstate



root@proxmox:~# cat /sys/devices/system/cpu/cpuidle/current_driver
intel_idle



root@proxmox:~# cat /proc/cpuinfo|grep 'microcode\|model name'
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x24000024
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x24000024
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x24000024
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x24000024



root@proxmox:~# cat /proc/cpuinfo | grep MHz
cpu MHz         : 1523.288
cpu MHz         : 1094.412
cpu MHz         : 2000.000
cpu MHz         : 2325.109
root@proxmox:~# sensors
nvme-pci-0100
Adapter: PCI adapter
Composite:    +51.9°C  (low  =  -0.1°C, high = +82.8°C)
                       (crit = +89.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +57.0°C  (crit = +119.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +52.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +44.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +44.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +44.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +44.0°C  (high = +105.0°C, crit = +105.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +42.9°C  (low  =  -0.1°C, high = +82.8°C)
                       (crit = +89.8°C)
 
Last edited:
Here's an update to my situation, I couldnt go past 4 days with the first microcode patch or before it. After applying the 0x24000024, I have gone to 11 1/2 days -- so looking really good so far.

I am on the following kernel on the ODroid H3+ with the N6005 processor

Linux ironman 5.15.85-1-pve #1 SMP PVE 5.15.85-1 (2023-02-01T00:00Z) x86_64 GNU/Linux
 
  • Like
Reactions: VictorMike
It's not gonna take months, it was submitted by a non-maintainer, 3/10 days so far, should be getting escalated soon enough.
 
Ok, so an update from me. Seems like, I have a different microkernel version..

Topton N6005 version 5 with 8 gb memory, 2933 MHz.

Uptime: 30 days 18 hours. Seems very stable. I have not experienced any rebooting/crashing at all since installing in Januar. Only performed reboots to relocate, update etc.

Running: 1 x VM (Opensense with many plugins), 4 x LXC (hereunder portainer). GPU passthrough enabled according to https://forum.proxmox.com/threads/plex-hw-transcoding-lxc-and-jasper-lake-igpu-passthru.116163/

It running a little too hot for my liking, but I rarely max the CPU out. Cabinet is also quite warm when touching.

Bash:
root@proxmox:~# uname -a
Linux proxmox 5.19.17-2-pve #1 SMP PREEMPT_DYNAMIC PVE 5.19.17-2 (Sat, 28 Jan 20                                                                                                                                                             23 16:40:25  x86_64 GNU/Linux
root@proxmox:~# cat /etc/kernel/cmdline
cat: /etc/kernel/cmdline: No such file or directory
root@proxmox:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
intel_pstate
root@proxmox:~# cat /sys/devices/system/cpu/cpuidle/current_driver
intel_idle
root@proxmox:~# cat /proc/cpuinfo|grep 'microcode\|model name'
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x1d
model name      : Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
microcode       : 0x1d
root@proxmox:~# cat /proc/cpuinfo | grep MHz
cpu MHz         : 1849.258
cpu MHz         : 2577.820
cpu MHz         : 2000.000
cpu MHz         : 1942.017
root@proxmox:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
root@proxmox:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +57.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +57.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +57.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +57.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +57.0°C  (high = +105.0°C, crit = +105.0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +45.0°C  (crit = +119.0°C)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +59.9°C  (low  =  -5.2°C, high = +82.8°C)
                       (crit = +87.8°C)

What should the CPU idle speed be like? 800mhz? Edit: yes, performance.

I/O delay is always 3%...
 
Last edited:
io delay for me alsways around 2-3% as well.
this is due to the vms activity.

cpu idle speed on that processor is 800 mhz yes.

to me it looks like your cpu governor might be performance instead of powersave.

edit: just saw that you posted your governor :) performance keeps the clockspeeds usually near maximum, while powersave keeps it at minimum unless necessary.
 
Last edited:
  • Like
Reactions: MrHello
io delay for me alsways around 2-3% as well.
this is due to the vms activity.

cpu idle speed on that processor is 800 mhz yes.

to me it looks like your cpu governor might be performance instead of powersave.

edit: just saw that you posted your governor :) performance keeps the clockspeeds usually near maximum, while powersave keeps it at minimum unless necessary.
Not correct. With N6005 io delay is on average ~3% from start, with no vm:s installed at all. At least with my Topton version.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!