VM freezes irregularly

Well, let's put it this way, if it is stable overall, it will be regardless QEMU 7.1 or 7.2

The whole house is a UPS, solar + battery banks, however, it is not contractor proof'd. He should have flip the breaker off for the room he is working on, but decided he was experienced enough. :D
 
Last edited:
  • Like
Reactions: AdriftAtlas
Just a heads up, the microcode can also be updated to 0x24000024 using the instructions at the following Intel git repo:
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files

(might need a version of the intel-microcode package installed first)

Only the 06-9c-00 file is needed for the N5105. Presumably this method will also work if any future microcode updates are issued by Intel without needing to wait for the package to be updated.
 
Last edited:
Just a heads up, the microcode can also be updated to 0x24000024 using the instructions at the following Intel git repo:
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files

(might need a version of the intel-microcode package installed first)

Only the 06-9c-00 file is needed for the N5105. Presumably this method will also work if any future microcode updates are issued by Intel without needing to wait for the package to be updated.

Yeah, that's what I am planning to do. Seems simple enough and more trustworthy.

Well, let's put it this way, if it is stable overall, it will be regardless QEMU 7.1 or 7.2

The whole house is a UPS, solar + battery banks, however, it is not contractor proof'd. He should have flip the breaker off for the room he is working on, but decided he was experienced enough. :D

Sparkies, gotta love 'em.
 
pfSense 23.01 restarted short of 15 days with no crash dump.

Updated everything, specifically kernel, QEMU, and microcode. Let's try again...
Code:
microcode: microcode updated early to revision 0x24000024, date = 2022-09-02
6.1.14-1-pve

In case anyone wants to manually update microcode to 0x24000024 (assumes you already have 0x24000023 package installed):
Code:
wget https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/archive/main.zip
unzip main.zip -d MCU
cp -r /root/MCU/Intel-Linux-Processor-Microcode-Data-Files-main/intel-ucode/. /lib/firmware/intel-ucode/
update-initramfs -u -k all
reboot
....
dmesg -T | grep microcode
 
Last edited:
I'm on microcode 24 since 11 days.
I'm on TrueNAS Scale with kernel 5.15.79.
Before update of microcode I think the record for VMs before they crashed was 7 days. Usually closer to 3-5 days and often just a day or two.
After update it's 11 days so looks promising.
Changelog for microcode update says it's just security fixes but nowadays this can mean anything like predictive branching heavily used in virtualization.
So fingers crossed ;)
 
That's actually a misconception that CPU microcode only do security fixes, they can also fix errata or work around it.
 
I didn't say that *every* microcode update is just about security fixes but revision 24 in this case is described as security fix ;)
Which obviously can have also impact on non security related behavior.
Or they simply didn't write *everything* in changelog which is also pretty common case.

Anyway in my case it's now stable for 12 days.
 
pfSense 23.01 restarted short of 15 days with no crash dump.

Updated everything, specifically kernel, QEMU, and microcode. Let's try again...
Code:
microcode: microcode updated early to revision 0x24000024, date = 2022-09-02
6.1.14-1-pve

In case anyone wants to manually update microcode to 0x24000024 (assumes you already have 0x24000023 package installed):
Code:
wget https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/archive/main.zip
unzip main.zip -d MCU
cp -r /root/MCU/Intel-Linux-Processor-Microcode-Data-Files-main/intel-ucode/. /lib/firmware/intel-ucode/
update-initramfs -u
reboot
....
dmesg -T | grep microcode
Hi,


I have myself the problem of a VM that freezes randomly. So I followed the method of updating the microcode, without success. I am still in "0x24000023", I don't understand why :-/.

root@proxmoxsrv:~# update-initramfs -u update-initramfs: Generating /boot/initrd.img-5.15.85-1-pve Running hook script 'zz-proxmox-boot'.. Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace.. No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
 
Last edited:
Did you comment or delete the file /etc/modprobe.d/intel-microcode-blacklist.conf

Because this is considered late-load update, you will also have to reboot after you remove the intel-microcode-blacklist.conf file, and after the update-initramfs -u
 
Last edited:
Did you comment or delete the file /etc/modprobe.d/intel-microcode-blacklist.conf

Because this is considered late-load update, you will also have to reboot after you remove the intel-microcode-blacklist.conf file, and after the update-initramfs -u
It's not really late-load update.
In fact info about it being loaded is 1st line in dmesg:

Code:
root@nas[~]# dmesg | head -n 1
[    0.000000] microcode: microcode updated early to revision 0x24000024, date = 2022-09-02

IMHO there's no need to remove blacklist in early-load.
That blacklisting is rather something to avoid overwriting microcode later.
Of course you can switch to use late-load but with CPU microcode it doesn't really make sense and is not recommended.

@Xandr3 Make sure you've got correct files in /lib/firmware/intel-ucode/ with correct ownership and correct permissions.
Also you can try to manually load that firmware first and check /proc/cpuinfo if it worked.

Code:
echo -n 1 >/sys/devices/system/cpu/microcode/reload

Debian readme file.
 
Hello, i have an intel celeron N5105 and since i installed proxmox i get random reboots of the whole system in syslog it just shows -- Reboot --
my vms never froze.
I did a mem test, passed every test
it seems to reboot when is idle, it never happend when the cpu had work to do.
i have c states enabled in bios

i updated the kernel to 6.1.14-1-pve and microcode to 0x24000024, date = 2022-09-02
reboots still happen
 
It's not really late-load update.
In fact info about it being loaded is 1st line in dmesg:

Code:
root@nas[~]# dmesg | head -n 1
[    0.000000] microcode: microcode updated early to revision 0x24000024, date = 2022-09-02

IMHO there's no need to remove blacklist in early-load.
That blacklisting is rather something to avoid overwriting microcode later.
Of course you can switch to use late-load but with CPU microcode it doesn't really make sense and is not recommended.

@Xandr3 Make sure you've got correct files in /lib/firmware/intel-ucode/ with correct ownership and correct permissions.
Also you can try to manually load that firmware first and check /proc/cpuinfo if it worked.

Code:
echo -n 1 >/sys/devices/system/cpu/microcode/reload

Debian readme file.

Hello,

I tested with "echo -n 1 >/sys/devices/system/cpu/microcode/reload" and new line appears with "dmesg" :
Code:
[    0.118452] SRBDS: Vulnerable: No microcode
[    1.451771] microcode: sig=0x906c0, pf=0x1, revision=0x24000023
[    1.451800] microcode: Microcode Update Driver: v2.2.
[  320.253182] microcode: updated to revision 0x24000024, date = 2022-09-02
[  320.253234] microcode: Reload completed, microcode revision: 0x24000024

Just before copying the "intel-ucode", I archived the one present in "/lib/firmware", then deleted and copied the last updated version.
I don't have "/etc/modprobe.d/intel-microcode-blacklist.conf" file.
 
Hello,

I tested with "echo -n 1 >/sys/devices/system/cpu/microcode/reload" and new line appears with "dmesg" :
Code:
[    0.118452] SRBDS: Vulnerable: No microcode
[    1.451771] microcode: sig=0x906c0, pf=0x1, revision=0x24000023
[    1.451800] microcode: Microcode Update Driver: v2.2.
[  320.253182] microcode: updated to revision 0x24000024, date = 2022-09-02
[  320.253234] microcode: Reload completed, microcode revision: 0x24000024

Just before copying the "intel-ucode", I archived the one present in "/lib/firmware", then deleted and copied the last updated version.
I don't have "/etc/modprobe.d/intel-microcode-blacklist.conf" file.
So it worked for you? Or after reboot you're back to 23?
 
Update: with microcode 024 no problems yet after 7d 15hs - no reboot or crash. All vms with host cpu, cstate enabled in bios (default was disabled), Nothing in grub except for pcie lan passtrough for pfsense. During this period i have tested new vm creating and deleted them. Used also a vm with windows7 and jdownloader that sometimes crashed before 024. Traffic on pfsense is about 1tb on lan, in last 3 months never reached more then 4/6 days and 500gb traffic. Hoping we have the solution.

Traffic is for Plex server, 4 cameras registering on nas server (proxmox’s vm), and others
 
Last edited:
Update: with microcode 024 no problems yet after 7d 15hs - no reboot or crash. All vms with host cpu, cstate enabled in bios (default was disabled), Nothing in grub except for pcie lan passtrough for pfsense. During this period i have tested new vm creating and deleted them. Used also a vm with windows7 and jdownloader that sometimes crashed before 024. Traffic on pfsense is about 1tb on lan, in last 3 months never reached more then 4/6 days and 500gb traffic. Hoping we have the solution.

Traffic is for Plex server, 4 cameras registering on nas server (proxmox’s vm), and others
Nice!!!
 
Hello, i have an intel celeron N5105 and since i installed proxmox i get random reboots of the whole system in syslog it just shows -- Reboot --
my vms never froze.
I did a mem test, passed every test
it seems to reboot when is idle, it never happend when the cpu had work to do.
i have c states enabled in bios

i updated the kernel to 6.1.14-1-pve and microcode to 0x24000024, date = 2022-09-02
reboots still happen
Sadly, your issue is not the same as this thread. For us, the VM is crashing / freezing / rebooting, not the Host. Yours is the opposite, where the host is crashing. Since your setup is passing memtest86, I would recommend testing with Prime95 and see if any worker threads fail.

Another thing to try is replacing the Power supply for your system. Sometimes a faulty power supply can cause the issue you are facing.
 
I seem to have solved the problem at the moment. It seems that this is a kernel problem. It doesn't need to update the microcode.
I locked the kernel of the pve host at 5.19.17-2. There won't be any problems. Openwrt has been working for 10 days without a restart.Currently, only specific kernels can reduce such errors.According to my findings, when the kernel is 6.1.2 and 5.19.17-2, this kind of problem will be alleviated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!