VM freezes irregularly

which would be a data point for "something in Ubuntu's kernel patch set" ;) could you try with the corresponding mainline kernel I linked above? on Ubuntu you should be able to just install them without the need to repack, since their dpkg is patched to support zstd.
When I was digging around kernel.org's bugzilla reports, I seem to remember quite a few KVM/QEMU kernel issues, some specifically related to the N5105 CPU.

Also, is it not possible to narrow down the issue based on the logs I've posted? I put the kernels in debug mode and ran remote logging to capture as much of the panic as possible.
 
Also, is it not possible to narrow down the issue based on the logs I've posted? I put the kernels in debug mode and ran remote logging to capture as much of the panic as possible.
no, those just show the the kernel in the VM gets confused by some invalid state caused by the bug. to narrow down the culprit we really need to bisect the host kernel (which is the one that has the bug).
 
  • Like
Reactions: gyrex
no, those just show the the kernel in the VM gets confused by some invalid state caused by the bug. to narrow down the culprit we really need to bisect the host kernel (which is the one that has the bug).

OK, that's a shame. I went to a lot of effort to try and get those logs :)

Hopefully @wolan can figure out which kernel works and doesn't work.
 
On the Ubuntu VM, I upgraded the kernel from 5.4.0.122 to here the latest 5.19.1-051901-generic and it didn't change anything in the freeze.

On the other hand, for the proxmox core, I don't have enough knowledge, especially in linux, to do that since you have to tweak to get there.
 
I'd like to help try and solve this issue because I'd prefer to run Proxmox than ESXi. I think I'll run up KVM on my N5105 box and try and help try and troubleshoot this. If there's 2 people testing kernels then we can find the issue quicker.

@fabian What's our goal here? To try and determine a kernel which works? Is there some deeper level of debug or logging we can run to diagnose this issue better? Also, is there an easier way to load the mainline kernels into Proxmox without unpackaging and repackaging them or building the kernel from scratch?
 
Last edited:
  • Like
Reactions: BarTouZ
I'd like to help try and solve this issue because I'd prefer to run Proxmox than ESXi. I think I'll run up KVM on my N5105 box and try and help try and troubleshoot this. If there's 2 people testing kernels then we can find the issue quicker.

@fabian What's our goal here? To try and determine a kernel which works? Is there some deeper level of debug or logging we can run to diagnose this issue better? Also, is there an easier way to load the mainline kernels into Proxmox without unpackaging and repackaging them or building the kernel from scratch?

If there is an easier technique concerning the core, I can also test with you in this case...
 
I'd like to help try and solve this issue because I'd prefer to run Proxmox than ESXi. I think I'll run up KVM on my N5105 box and try and help try and troubleshoot this. If there's 2 people testing kernels then we can find the issue quicker.

@fabian What's our goal here? To try and determine a kernel which works? Is there some deeper level of debug or logging we can run to diagnose this issue better? Also, is there an easier way to load the mainline kernels into Proxmox without unpackaging and repackaging them or building the kernel from scratch?
the first goal is to find out whether it's something in Ubuntu or our patch set that causes this by testing a mainline kernel version as close as possible to the affected Ubuntu/PVE kernel.

- if the mainline kernel works without issues, it is an Ubuntu-/PVE-specific issue, and we continue with git-bisecting the Ubuntu kernel. that is quite involved as far as time is concerned, but once the initial setup is done it's not hard per se, https://wiki.ubuntu.com/Kernel/KernelBisection contains instructions
- if the mainline kernel also shows the issue, we do a rough bisect of the mainline kernels in both directions (using the pre-built packages I linked) until we find the kernel release that introduced the issue (and possible one that contains a fix). once we have the latest good kernel before it goes bad (and possibly, earliest good kernel after which it is fixed) we take a closer look at the changes there, and if needed, do a git-bisect of the mainline kernel with those versions
 
For what it's worth, I have decided to give up on my N5105 unit and replace it with something that will hopefully be more reliable. As such, I'll, at least for a while, have an N5105 that I could do some more experimentation on. I will probably try to sell it relatively quickly, but I'm willing to keep it for a while to see if I can help contribute to a solution for others.
 
I really hope you figure this out. I'm running OPNSense on Proxmox and the VM freezes every 4-6 hours. I've a cron script running every 5 minutes that stops and starts the VM when it hangs, but my wife has noticed the issues and began asking questions...
 
  • Like
Reactions: gyrex
I am back to 5.12.0 now, so far run the following, all of them crashing:
5.15.39
5.15.0
5.14.0
5.13.0
 
  • Like
Reactions: gyrex
Not yet, but I've just changed the chipset on my VMs from i440 to q35. I plan to see how they behave on 5.12.0 and then move from 5.15.39 up.
 
Not yet, but I've just changed the chipset on my VMs from i440 to q35. I plan to see how they behave on 5.12.0 and then move from 5.15.39 up.
My testing showed that this made no difference to the VMs freezing and this idea was quashed by one of the kernel.org devs too. Best to try and find out a kernel (if any) works with this CPU rather than worrying about the machine type reported to the VM.

I wonder if this issue exists with the 4.x kernel. Could you try the 4.19 kernel which is the latest LTS release of 4.x?

@fabian What happens if we can't find a kernel which works with this CPU?
 
Last edited:
I don't think I could go that deep, with mainline tool I can install 5.0.0 the lowest. I doubt the dependencies would let me compile 4.9 with current distro.
 
  • Like
Reactions: gyrex
I don't think I could go that deep, with mainline tool I can install 5.0.0 the lowest. I doubt the dependencies would let me compile 4.9 with current distro.
Yeh, I think Ubuntu 18.04 would be the required distro to run 4.x kernels. If you're not able or willing to go down there, I could do some VM shuffling and try it. I'd just like to know that our efforts aren't in vain and what happens if a 4.x kernel doesn't work.

Before we go down to 4.x, can you try loading 5.19?

Edit: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.19.3/
 
Last edited:
The N5105 was introduced in January 2021, kernel version 5.10, released in December 2020 was the current one at that time. Going further back in time probably won't make any difference I guess.
 
  • Like
Reactions: gyrex
The N5105 was introduced in January 2021, kernel version 5.10, released in December 2020 was the current one at that time. Going further back in time probably won't make any difference I guess.
We're running out of kernels :) Could you try 5.19.3?
 
I am back to 5.12.0 now, so far run the following, all of them crashing:
5.15.39
5.15.0
5.14.0
5.13.0
Just to not miss something - you're testing this, by installing these kernel-version on the physical machine with N5105 (and running Ubuntu + KVM)?

for all these versions guests on the machine freeze/crash?

That might point to a different issue than what @fabian had in mind.

hope not to have missed this part - the machine you're testing this on has the latest available firmware installed - and you also installed the `intel-microcode` package?

Thanks for your efforts in testing this!
 
  • Like
Reactions: gyrex
Yeah, I am installing the kernels on a physical machine, I have a NUC11ATKC4, with N5105, running fresh Ubuntu 22.04 installation with KVM.
The intel-microcode package is installed, the BIOS version I will check during next reboot. Is there anything else in terms of firmware I could update?

The VMs I currently run are a Ubuntu 22.04 fresh installation with docker and a Home Assistant OS, which I believe is some version of debian. The latter I've been using for more than 2 years on older intel machine and it was rock solid.

The VMs seem to freeze intermittently when the service running on them is used, when a soft reboot is invoked, or even after a keystroke waking up the console. Then the VMs `qemu-system-x86_64` process can be observed with 100% cpu usage and it needs to be killed manually.
 
I'm running my OPNSense VM with 3 bridged NICs (1 LAN and 2 WANs, no PCI passthrough). CPU Type is HOST, Bios is OVMF (UEFI) and Machine Type Q35. Are you experiencing VM freezes with a similar setup than the above? I could try creating other OPNSense VMs with different specs if you think is worth it.