Proxmox keeps crashing the host

Olorun1

New Member
Nov 4, 2024
4
0
1
I have spent the past 3 days trying to get a stable build off the ground for a homelab environment.
Huananzhi x99 f8d Plus motherboard
dual Xeon 2687 v3
128gb DDr4 2133 ECC
6650xt GPU
Crucial P3 1TB PCIe M.2 2280 SSD | CT1000P3SSD8




Tried running Proxmox 8.2.2 and 7.4. Same issues where the host server completely reboots randomly and everytime I load a vm. no vm finishes installing. This includes, Kali, Debian, Ubuntu, Windows 10, Windows server22.

However, the hardware supports a baremetal install of each with no problems. I have also tried other hypervisors on the same platform with no issues.
 
Hi,

Do you see anything in the syslog before and after the reboot in 30~ min?

FYI, you can get the syslog with specific time/date in the Proxmox VE Web UI by going to `Datacenter -> {NodeName} -> System Log` Or using journalctl CLI e.g.:
Bash:
journalctl --since '2024-10-04 00:00:00' --until '2024-10-04 08:50:15' > /tmp/$(hostname)-syslog.txt
You may have to edit the date/time in the above command.
 
There are a few events that I don't understand in this log. Not sure if they are related. But they happen consistently.

Nov 04 11:32:18 oloserver kernel: kvm_intel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.

Nov 04 11:32:02 oloserver proxmox-mail-fo[1462]: oloserver proxmox-mail-forward[1462]: could not notify via target `mail-to-root`: could not notify via endpoint(s): mail-to-root: At least one recipient has to be specified!

Nov 04 11:31:52 oloserver smartd[1122]: Device: /dev/sda [SAT], 65525 Currently unreadable (pending) sectors


Nov 04 11:31:40 oloserver systemd-journald[689]: File /var/log/journal/50bdaef769c946b59dd2b4b3e292cc83/system.journal corrupted or uncleanly shut down, renaming and replacing.
Nov 04 11:31:40 oloserver kernel: spl: loading out-of-tree module taints kernel.
Nov 04 11:31:39 oloserver systemd[1]: Finished systemd-pstore.service - Platform Persistent Storage Archival.
Nov 04 11:31:39 oloserver systemd-modules-load[691]: Inserted module 'vhost_net'
Nov 04 11:31:39 oloserver systemd[1]: Finished systemd-sysusers.service - Create System Users.
Nov 04 11:31:39 oloserver systemd[1]: Starting systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev...
Nov 04 11:31:40 oloserver systemd[1]: Finished systemd-journal-flush.service - Flush Journal to Persistent Storage.
Nov 04 11:31:40 oloserver kernel: zfs: module license 'CDDL' taints kernel.
Nov 04 11:31:40 oloserver kernel: Disabling lock debugging due to kernel taint
Nov 04 11:31:40 oloserver kernel: zfs: module license taints kernel.
 

Attachments

  • server log files.txt
    100.7 KB · Views: 0
Nov 04 11:31:52 oloserver smartd[1122]: Device: /dev/sda [SAT], 65525 Currently unreadable (pending) sectors
That's not good. Pending sectors mean the drive is very likely about to die. At least it is causing trouble during operation. I would replace it ASAP and see if this resolves the issue for you.
 
Huananzhi x99 f8d Plus motherboard
dual Xeon 2687 v3
128gb DDr4 2133 ECC
I hope you did not pay much for this combo. V3 Xeons with DDR4 suck. Sorry to rain on your parade, but anywho, to your specific issue:
If PM crashes then try to eliminate the HW problem by installing any Win OS: 10, 11, WS2016+ will do. If Windowze runs stable, then PM is the problem. If Windowze crashes just like PM, then it is the crappy Chicom mobo or RAM that is screwing you up. This is not unheard of. These mobos are a hit and miss. Especially if your RAM is Chicom too.
Although, on a 2nd thought: what are you installing from? A flaky USB connection may be screwing you up. During heavy traffic, flaky USB controllers and circuitry may sag power to the device and corrupt xfers. Try a different media. Heck, even a physical DVD-RW may work better than a USB stick on a flaky port/connection.
 
Last edited:
  • Like
Reactions: Johannes S
That's not good. Pending sectors mean the drive is very likely about to die. At least it is causing trouble during operation. I would replace it ASAP and see if this resolves the issue for you.
I've switched out the HDD and still no success. I think The RAM might be bad but I don't see any error messages that indicate it.
 
I hope you did not pay much for this combo. V3 Xeons with DDR4 suck. Sorry to rain on your parade, but anywho, to your specific issue:
If PM crashes then try to eliminate the HW problem by installing any Win OS: 10, 11, WS2016+ will do. If Windowze runs stable, then PM is the problem. If Windowze crashes just like PM, then it is the crappy Chicom mobo or RAM that is screwing you up. This is not unheard of. These mobos are a hit and miss. Especially if your RAM is Chicom too.
Although, on a 2nd thought: what are you installing from? A flaky USB connection may be screwing you up. During heavy traffic, flaky USB controllers and circuitry may sag power to the device and corrupt xfers. Try a different media. Heck, even a physical DVD-RW may work better than a USB stick on a flaky port/connection.
Money was definitely a consideration. This whole setup was supposed to be a quick homelab that I could run/learn security tools and brush up on AD. Its becoming a nightmare. Until today I was able to load Windows (various) and Linux (various) with no problems. Now I can't even get past the log on screen without the BOD in windows. I am troubleshooting the ram now.
 
Until today I was able to load Windows (various) and Linux (various) with no problems. Now I can't even get past the log on screen without the BOD in windows. I am troubleshooting the ram now.
This is, unfortunately, oh, so typical for all of those Huanan, Atermiter, Kllisre, Machinist, Qiyida and other Chicom kits: some of them work OOB, others do not work at all, yet others initially work but then go bananas. Then their victims embark on flashing various BIOSs and often kill their still half-dead mobo. It is a lottery.
If Windowze does not work, then the HW is your culprit. HP Z440 machines are not that expensive today. Get one of those. Will be ways bettern than a Chicom mongrel mobo.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!