Hi,
I will try to explain the problem as best as i can:
i have a board with a 12900hk, the board it's a similar board to the erying one.
In this board i only have connected:
PSU Corsair RM850 850W
2 x 16gb ram corsair 3200 DDR4
RJ45 cable
1 NVME CT1000P2 connected on the bottom NVME slot, close to the pci slot
1 USB with Unraid that i'm currently not using
Tests that i made:
memtest completed on ram and all was fine
The board didn't halt/got stuck on unraid or windows 10
I have virtualization and IOMMU enabled on BIOS.
I tried both 1gbps interface and also the 2.5gb one, both of them have the same behavior
I was using the board with unraid but i don't like at all the VM management, so i switched the OS to proxmox 7.4
It was a mess, it crashed a lot, like each 20 minutes the whole system crash and it would halt.
i was also seeing a lot of errors related to the PCI ASPM (8gb of log errors in 5-6 minutes) and i got a "fix" here by adding pcie_aspm=off
完全体的全能主机,大小核的终极方案 - 3.使用篇 - 知乎
But that didn't solved the problems
So i thought that it might be related to the fact that it's an older version and changed it to Proxmox 8.1 but i'm actually seeing the same.
When the servers halt there are no logs on journalctl and no messages on dmesg and the only way to recover it is to force shutdown by holding the power button.
I also had a ping with a keyboard+screen directly connected to the server and when it halt the cli won't respond at all and the screen won't to anything
1º I install fresh Proxmox 8.1 ext4 type
2º I manually copy the .raw vm's drives to the server, configure them and start them
2.1 º The vm's are actually light, it's HomeAssistant, Klipper and 2 Ubuntu servers, each one with 1 core and 2gb of ram
3º i let the server and it was able to stay "alive" around 2 days
4º i enabled IOMMU following this guide https://www.servethehome.com/how-to-pass-through-pcie-nics-with-proxmox-ve-on-intel-and-amd/ this was on Wednesday
I also added pcie_aspm=off as the errors that i saw on proxmox 7.4 and also pcie_port_pm=off
5º The server was working fine until today at 2 am, when it got stuck again (attached file)
6º I tried to change drivers of the network interfaces, as it's using the rtl8169 driver just in case, as a previosly with other boards had problems with this but i couldn't make it work following this guide https://www.reddit.com/r/Proxmox/co...8169_nic_dell_micro_formfactors_in/?rdt=51878
The drivers weren't working and i had to manually reverse it back to rtl8169 as proxmox wasn't seeing the network interfaces
7º Right now i'm trying with the iommu disabled if that could be the case
Any ideas?
I want to throw the board out of the window
I will try to explain the problem as best as i can:
i have a board with a 12900hk, the board it's a similar board to the erying one.
In this board i only have connected:
PSU Corsair RM850 850W
2 x 16gb ram corsair 3200 DDR4
RJ45 cable
1 NVME CT1000P2 connected on the bottom NVME slot, close to the pci slot
1 USB with Unraid that i'm currently not using
Tests that i made:
memtest completed on ram and all was fine
The board didn't halt/got stuck on unraid or windows 10
I have virtualization and IOMMU enabled on BIOS.
I tried both 1gbps interface and also the 2.5gb one, both of them have the same behavior
I was using the board with unraid but i don't like at all the VM management, so i switched the OS to proxmox 7.4
It was a mess, it crashed a lot, like each 20 minutes the whole system crash and it would halt.
i was also seeing a lot of errors related to the PCI ASPM (8gb of log errors in 5-6 minutes) and i got a "fix" here by adding pcie_aspm=off
完全体的全能主机,大小核的终极方案 - 3.使用篇 - 知乎
But that didn't solved the problems
So i thought that it might be related to the fact that it's an older version and changed it to Proxmox 8.1 but i'm actually seeing the same.
When the servers halt there are no logs on journalctl and no messages on dmesg and the only way to recover it is to force shutdown by holding the power button.
I also had a ping with a keyboard+screen directly connected to the server and when it halt the cli won't respond at all and the screen won't to anything
1º I install fresh Proxmox 8.1 ext4 type
2º I manually copy the .raw vm's drives to the server, configure them and start them
2.1 º The vm's are actually light, it's HomeAssistant, Klipper and 2 Ubuntu servers, each one with 1 core and 2gb of ram
3º i let the server and it was able to stay "alive" around 2 days
4º i enabled IOMMU following this guide https://www.servethehome.com/how-to-pass-through-pcie-nics-with-proxmox-ve-on-intel-and-amd/ this was on Wednesday
I also added pcie_aspm=off as the errors that i saw on proxmox 7.4 and also pcie_port_pm=off
5º The server was working fine until today at 2 am, when it got stuck again (attached file)
6º I tried to change drivers of the network interfaces, as it's using the rtl8169 driver just in case, as a previosly with other boards had problems with this but i couldn't make it work following this guide https://www.reddit.com/r/Proxmox/co...8169_nic_dell_micro_formfactors_in/?rdt=51878
The drivers weren't working and i had to manually reverse it back to rtl8169 as proxmox wasn't seeing the network interfaces
7º Right now i'm trying with the iommu disabled if that could be the case
Any ideas?
I want to throw the board out of the window