v7 works perfectly v8 crashes

bellocarico

Member
Sep 17, 2022
54
4
13
I have just built a new server, and installed v8 straight away. Apart from a minor issue with NVMe which seems to be a BIOS related (they appear or not randomly after the POST) I opted to stay safe and installed everything on a single SSD in ext4. Call me old school, I'm fine with that :)

So I run v8 for sometime with no issue but one day lots of crashed started. As you can imagine with a brand-new hardware you go fiddling a bit everywhere especially in the BIOS and I started to blame myself to have caused the issue. However, after a BIOS reset to defaults and a fresh installation of v8 the crashes appeared prettuy much immediately.

I read a lot about the crash message and everything suggests it might be a RAM fault, however an extensive overnight RAM test confirmed this not to be the issue. So what changed? The only thing that really came to my mind is: the kernel.

I can't tell for sure what v8 kernel worked and what not. My initial v8 Installation was around 4th Nov 24, and it remained stable until about 10th November when I guess I installed some updates and I can't exclude a new kernel too.

As a POC I wiped everything and installed v7. Guess what? It's super stable.

So a couple of questions here:
- Is it safe for me to now upgrade to v8 but planning to downgrade the kernel?
- What is the correct procedure to list the available kernels I could downgrade to?
- Will the internal Proxmox update always offer to upgrade the kernel even if I pin an old one?
- Am I the only one experiencing this with the latest kernels?
 
Hi,
this is not about crashes, but failing passthrough and the issue has likely been identified and will be fixed in an upcoming 6.8 kernel.

@bellocarico did you already try the newer 6.11 kernel ? Did you have latest CPU microcode/BIOS updates installed? Otherwise, please share more details about your hardware, e.g. output of lscpu and the full system logs/journal surrounding the crashes.

So a couple of questions here:
- Is it safe for me to now upgrade to v8 but planning to downgrade the kernel?
It will not be officially supported and while possible in principle, you might run into other issues down the line with missing kernel features.
- What is the correct procedure to list the available kernels I could downgrade to?
You would need to manually install them from the repository for Proxmox VE 7
- Will the internal Proxmox update always offer to upgrade the kernel even if I pin an old one?
AFAIK you will need to have a new kernel installed, because of package dependencies, but you can pin an old one for booting itself.
- Am I the only one experiencing this with the latest kernels?
Likely not, but sharing the errors would be helpful, so attempts to identify and fix it can be made and others with the same issue can provide input too.
 
Hi,

this is not about crashes, but failing passthrough and the issue has likely been identified and will be fixed in an upcoming 6.8 kernel.

@bellocarico did you already try the newer 6.11 kernel ? Did you have latest CPU microcode/BIOS updates installed? Otherwise, please share more details about your hardware, e.g. output of lscpu and the full system logs/journal surrounding the crashes.


It will not be officially supported and while possible in principle, you might run into other issues down the line with missing kernel features.

You would need to manually install them from the repository for Proxmox VE 7

AFAIK you will need to have a new kernel installed, because of package dependencies, but you can pin an old one for booting itself.

Likely not, but sharing the errors would be helpful, so attempts to identify and fix it can be made and others with the same issue can provide input too.
Thanks, please find here the lscpu:

Code:
lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          20
On-line CPU(s) list:             0-19
Thread(s) per core:              1
Core(s) per socket:              14
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           191
Model name:                      13th Gen Intel(R) Core(TM) i5-13600T
Stepping:                        2
CPU MHz:                         3692.050
CPU max MHz:                     4800.0000
CPU min MHz:                     800.0000
BogoMIPS:                        3609.60
Virtualization:                  VT-x
L1d cache:                       336 KiB
L1i cache:                       224 KiB
L2 cache:                        8.8 MiB
NUMA node0 CPU(s):               0-19
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
                                  rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64
                                  monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
                                  rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid
                                 ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves
                                  avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid
                                 movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr flush_l1d arch_capabilities

I did try the very latest updates/kernels until this week when eventually I had to give up and roll back to v7. The logs unfortunately I don't have saved anywhere but they suggested "BUG" and memory allocation issue. But as stated, a RAM test passed with no issues.
 
So after some time of Stability on v8 (I had to set IOMMU on each VM I have, the issue appears to be linked to an LXC with iGPU pass though. This creates the following crash at random

1733819681465.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!