What kernel are you running on the PVE host? Since upgrading to the 5.19.x kernel, my VMs (Ubuntu and pfSense) with uptimes of 30+ days which have only been marred by a power outage.
Hi
@gyrex , it looks like you have solved the problem? Are you using NVMe SSDs or SATA SSDs after switching back to Proxmox from ESXi?
I'm also a N5105 series CPU host user, and I'm also having problems with VMs rebooting irregularly:
The openwrt VM reboots, but PVE shows openwrt as fine, running continuously, and the only entries in the logs seem to be related to NIC reboots. This is often the case when other VMs running in PVE are doing PT downloads at high speed.
In addition, sometimes the openwrt VM does not reboot, but the network interface does reboot as well.
In all these cases, the PVE host is running fine as well as LXC containers.
Seeing the discussion in this thread about the kernel version, I also tried to upgrade to the edge PVE kernel version 5.19, but no luck. Now I suspect a hardware-related cause:
The small box I'm using is very compact, with a high-temperatureNVMe SSD and an Intel i225v King NIC in close proximity (see attached picture).
Combined with the previous scenario of frequent problems with high-speed PT downloads, I suspect that the SSD and the NIC were working under high load at the same time, causing the NIC to drop out, which then led to a series of problems, as I found that openwrt would keep sending out a lot of attempts when the network was disconnected, leading to a reboot after exhausting all resources (or maybe for other reasons).
The above theory seems to explain the following phenomenon:
1. Everything works fine at low load, but frequently fails at high speed PT downloads
2. Migration to SATA SSDs alleviates the problem (this is yet to be verified, look forward to your reply
@gyrex )
I would also recommend that users running small, compact boxes stress test both the hard drive and the network, and describe the hardware you are using when discussing the problem to verify that the problem
I plan to run more stress tests on SSDs and networks later, and will follow up with the results.
I'm not an expert in this area, so I'd also like to ask
@fabian if a dropped network card due to overheating could be a possible cause of a VM reboot, and I'd like to ask for some advice on how to locate the problem (e.g. how to get valid logs, etc.), and I'd be happy to contribute if possible.
My hardware configuration is attached:
CPU: Intel N5095
NIC: Intel i225v-b3
Motherboard Brand: ChangWang