Hi
On my end, still on kernel 5.19.17-1-pve, with 32 days uptime, two VMs, OPNsense ( 3 (1 sockets, 3 cores) [host,flags=-pcid;-spec-ctrl;-ssbd;+aes] [cpuunits=2048] ) with VirtIO NICs and HomeAssistant, and two LXC containers (PiHole and TP-Link Omada Controller, based on Ubuntu 22.04).
root@pve:~# last reboot | head -n 1
reboot system boot 5.19.17-1-pve Sat Nov 26 20:14 still running
root@pve:~# uptime
11:48:25 up 32 days, 15:33, 1 user, load average: 0.39, 0.32, 0.29
root@pve:~# uname -a
Linux pve 5.19.17-1-pve #1 SMP PREEMPT_DYNAMIC PVE 5.19.17-1 (Mon, 14 Nov 2022 20:25:12 x86_64 GNU/Linux
Host is a Topton N5105 (CW-6000) with i225 B3 NICs, BIOS date 29/09/2022, 2x8GB RAM, 1x NVMe SSD WD SN530. Extra Noctua 40mm fan 12v (NF-A4x10 PWM) as exhaust is inaudible (as intake the noise would be noticeable).
But I've applied several options to the kernel cmdline, see below.
Kernel cmdline options:
intel_idle.max_cstate=1 (disable C-states below 1 (such as C3))
intel_iommu=on iommu=pt (Enable iommu, since at the begining I was going to use passthrough NICs to the OPNsense VM, but ended up using Virtio NICs, while testing for the crashes, and kept them)
mitigations=off (Self explanatory)
i915.enable_guc=2 ( Enable low-power H264 encoding,
https://01.org/linuxgraphics/downloads/firmware ,
https://jellyfin.org/docs/general/administration/hardware-acceleration/#intel-gen9-and-gen11-igpus )
initcall_blacklist=sysfb_init ( GPU passthrough ,
https://wiki.tozo.info/books/server/page/proxmox-gpu-passthrough )
nvme_core.default_ps_max_latency_us=14900 (
https://esc.sh/blog/nvme-ssd-and-linux-freezing/ )
Also, due to i2c-6 NAK errors ( [Sat Nov 26 20:14:37 2022] i2c i2c-6: sendbytes: NAK bailout. ) related to the iGPU I've connected a dummy HDMI dongle after confirming that with a monitor plugged in the errors stoppped and so did system crashes, but by then I've had already applied other kernel parameters.
Didn't test if those were related to the enabling of i915 GuC/HuC or not.
And due to errors related to the NVMe SSD (WD SN530 M.2 2242) I've applied the nvme_core.default_ps_max_latency_us parameter as well.
[Tue Nov 29 11:46:52 2022] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[Tue Nov 29 11:46:52 2022] nvme 0000:01:00.0: device [15b7:5008] error status/mask=00000001/0000e000
[Tue Nov 29 11:46:52 2022] nvme 0000:01:00.0: [ 0] RxErr
- edit -
Also updated the intel microcode:
root@pve:~# dmesg -T | grep microcode
[Sat Nov 26 20:14:32 2022] microcode: microcode updated early to revision 0x24000023, date = 2022-02-19
[Sat Nov 26 20:14:32 2022] SRBDS: Vulnerable: No microcode
[Sat Nov 26 20:14:33 2022] microcode: sig=0x906c0, pf=0x1, revision=0x24000023
[Sat Nov 26 20:14:33 2022] microcode: Microcode Update Driver: v2.2.
- edit -
Hopefully someone else finds this information helpful.