Proxmox "random" stuck/halt on 12900HK Erying Similar board

bocatadejamon

New Member
Apr 25, 2024
11
0
1
Hi,

I will try to explain the problem as best as i can:

i have a board with a 12900hk, the board it's a similar board to the erying one.

r/Proxmox - Proxmox random stuck/halt on 12900HK Erying Similar board

In this board i only have connected:

PSU Corsair RM850 850W

2 x 16gb ram corsair 3200 DDR4

RJ45 cable

1 NVME CT1000P2 connected on the bottom NVME slot, close to the pci slot

1 USB with Unraid that i'm currently not using


Tests that i made:

memtest completed on ram and all was fine

The board didn't halt/got stuck on unraid or windows 10

I have virtualization and IOMMU enabled on BIOS.

I tried both 1gbps interface and also the 2.5gb one, both of them have the same behavior


I was using the board with unraid but i don't like at all the VM management, so i switched the OS to proxmox 7.4

It was a mess, it crashed a lot, like each 20 minutes the whole system crash and it would halt.

i was also seeing a lot of errors related to the PCI ASPM (8gb of log errors in 5-6 minutes) and i got a "fix" here by adding pcie_aspm=off

完全体的全能主机,大小核的终极方案 - 3.使用篇 - 知乎

But that didn't solved the problems

So i thought that it might be related to the fact that it's an older version and changed it to Proxmox 8.1 but i'm actually seeing the same.

When the servers halt there are no logs on journalctl and no messages on dmesg and the only way to recover it is to force shutdown by holding the power button.

I also had a ping with a keyboard+screen directly connected to the server and when it halt the cli won't respond at all and the screen won't to anything

1º I install fresh Proxmox 8.1 ext4 type

2º I manually copy the .raw vm's drives to the server, configure them and start them

2.1 º The vm's are actually light, it's HomeAssistant, Klipper and 2 Ubuntu servers, each one with 1 core and 2gb of ram

3º i let the server and it was able to stay "alive" around 2 days

4º i enabled IOMMU following this guide https://www.servethehome.com/how-to-pass-through-pcie-nics-with-proxmox-ve-on-intel-and-amd/ this was on Wednesday

I also added pcie_aspm=off as the errors that i saw on proxmox 7.4 and also pcie_port_pm=off

5º The server was working fine until today at 2 am, when it got stuck again (attached file)

6º I tried to change drivers of the network interfaces, as it's using the rtl8169 driver just in case, as a previosly with other boards had problems with this but i couldn't make it work following this guide https://www.reddit.com/r/Proxmox/co...8169_nic_dell_micro_formfactors_in/?rdt=51878

The drivers weren't working and i had to manually reverse it back to rtl8169 as proxmox wasn't seeing the network interfaces

7º Right now i'm trying with the iommu disabled if that could be the case

Any ideas?

I want to throw the board out of the window
 

Attachments

So, i tried and updated to proxmox 8.2 to test the kernel i case it would fix my problems, but it's still happening

Today 1 got a new crash around 10:55 am

Right now i'm making tests with the cstates
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_aspm=off pcie_port_pm=off intel_idle.max_cstate=3"
GRUB_CMDLINE_LINUX=""
 

Attachments

I changed the S.O to Unraid (i was planning to run proxmox and have a VM with unraid) and in unraid i'm running perfectly, no problems even with some VM's so it's very weird
On Unraid i'm running same microcodes, same max cstate (9) and same bios, i didn't changed any option

Any idea?
I would really like to run proxmox as i don't like the VM options on Unraid
 
So i came back to proxmox after placing another board on my unraid nas server

I installed Windows 10 on the board and had running Prime95 for 1 and half hours with no problems
Brand new 750W PSU

After that:

Fresh Proxmox 8.2 install on a 1 tb nvme

After finishing install
Configure syslog server to the nas
apt update && apt install rsyslog


Apply modprobe for the temp sensors:
modprobe nct6775
Disabled enterprise repos and enabled the non suscription ones
Install lm-sensors

Until now all normal, it works fine.

I enable IOMMU using


nano /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

And added modules:
nano /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

I rebooted and around 10 minutes later the proxmox server started to crash and in each boot around 5-20 min it crashes, no logs in journal, no logs on syslog

Now i'm trying the same with another NVME drive
 
Last edited:
OKay, so i installed Proxmox 8.2 in another NVMe, a 250gb one

Installed perfectly using ext4,

Booted as normal
Removed enterprise repos
Enabled syslog and configured it to the nas
Installed lm-sensors
Apply modprobe for the temp sensors:
modprobe nct6775
Made an apt-get update & upgrade just in case

After that i created a new VM and installed a OS in it

For now it has been 1 hor and 30 minutes and the system it's stable (i didn't enabled IOMMU)
 
Same problems, it's still happening

I tried enabling iommu with =pt and it worked for 1 week only to crash again
 
hi back

I bought a new board

Installed Proxmox 8 fresh from USB (Virtual Environment 8.2.4)
All instalation fine

The computer reboots to finish install, i log in the web interface and made this as usual:

disable suscription repo
apt-get update
apt-get upgrade

Check microcodes:

root@proxmox:~# grep 'stepping\|model\|microcode' /proc/cpuinfo
model : 154
model name : Genuine Intel(R) 0000
stepping : 0
microcode : 0x1c

Update microcodes:

2) Add non-free-firmware to debian repo in sources.list
- Edit the /etc/apt/sources.list file. Add non-free-firmware to the 1st line so it looks like this---
- deb http://ftp.debian.org/debian bookworm main contrib non-free-firmware
3) Save Changes
4) #apt clean && apt update
5) #apt install intel-microcode
CRASH during install add whole computer stop responding over gui and over cli directly connected to the board(screen+keyboard+mouse)
No errors on screen


I reboot by pressing the power button, log in again and finish the intel microcode install:

After install:

model name : Genuine Intel(R) 0000
stepping : 0
microcode : 0x1c
root@proxmox:~#
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!