Proxmox random reboots on HP Elitedesk 800g4 - fixed with proxmox install on top of Debian 12 - now issues with hardware transcoding in plex

limit

New Member
Aug 14, 2023
14
5
3
I've been dealing with an issue with the HP Elitedesk systems for a few weeks now. I ordered 2 identical (but from different sellers) HP elitedesk 800G4's from ebay. The first on I received, i was able to load Proxmox from the official bare metal installer and everything has worked fine. No random reboots, no noticeable issues. I did add a nvme drive for ZFS local storage and I installed new RAM (32GB).

The second unit arrived, I loaded up proxmox the same way, installed an additional nvme drive for ZFS local storage and added the new 32GB of RAM. I created a cluster from node 1 (pve1 in this case, the working node that has no issues) and added node 2 (pve2) to the cluster. I also setup an external qdevice for the cluster so quarum is OK. Things seemed fine, until the second device randomly rebooted. I thought maybe it was just a blip of some sort and continued on.. then more random reboots and just no system stability at all on the second node. Searching the syslog and doing some searching on this forum didn't really find a working solution. I did see someone mention to try adding pci=assign-busses apicmaintimer idle=poll reboot=cold,hard to the default command line for grub, this did seem to help however it caused another issue, very high CPU temps even when nothing was going on. I suspect it's the idle=poll. if I remove that, the problems came back.

I tried swapping the power supply on both devices to see if that was the problem but pve2 continued to have the issue. I've reinstalled proxmox several times, trying BIOS and UEFI, i tried doing a factory reset on the BIOS, tried disabling the TPM. I finally decided there must be some issue with the CPU or motherboard since everything else was new (nvme drives, ram, tested different power supply, etc) so I sent the device back and paid the restock fee. I then ordered another elitedesk 800g4 from a different vendor and lo and behold the same exact issues are happening with this device also. Random reboots, no rhyme or reason. Its been a couple of weeks now of messing with settings, reloading proxmox etc etc.

So finally, I decided to load Debian 12 on the device and just see how it runs, that's when I found out I could install proxmox on a debian install. I followed this guide: https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm

Here's the thing, I've had this device running now for 24 hours and I've had 0 stability issues. Got it joined to the cluster, moved several VM's and containers over, things are working great. So first question, what could be the difference here? why would this work over the bare-metal proxmox install?

Now surfaces a new issue, I have plex installed on an ubuntu LXC container and I have GPU passthrough working (hardware transcoding works fine) on pve1. I was able to get it working on pve2 no problem before, but now after the debian -> proxmox install I cannot get hw transcoding to work. I can see that the iommu is enabled and I can see the render device so i'm not exactly sure what the issue is. Prerhap some driver is missing that is included in the base proxmox install? Maybe some module needs to be loaded, haven't been able to find much in my googling for this specific type of issue.
 
Last edited:
  • Like
Reactions: Moonrise6220
When you mentioned the GPU passthrough issue, I came to think about your original issue also being GPU related.

1) Do you have a monitor connected to pve1 or pve2?

2) On pve1 you should have pve-firmware installed by default, is that correct?
Do you have firmware-misc-nonfree installed on pve2 (has to be added manually to Debian)?
 
When you mentioned the GPU passthrough issue, I came to think about your original issue also being GPU related.

1) Do you have a monitor connected to pve1 or pve2?

2) On pve1 you should have pve-firmware installed by default, is that correct?
Do you have firmware-misc-nonfree installed on pve2 (has to be added manually to Debian)?
Yes currently there is a monitor connected to pve2, I can test without it after work.
If the pve-firmware is installed by default then I assume it's there but again i'll check tonight when i get home.

Is the firmware-misc-nonfree part of the proxmox repo? I didn't add it specifically i'll search for it also.
 
I currently have an issue with a ProDesk 400 G5 where it is crashing when NO monitor or even a DisplayPort cable is connected.
I found that when you remove kbl_dmc_ver1_04.bin, this problem goes away.

Proxmox bare metal installer adds the pve-firmware package which includes this dmc firmware.
I don’t know if Proxmox install on Debian also installs pve-firmware by default.

In pure Debian (without Proxmox), you need to add the non free repository and then install the firmware-misc-nonfree package. If you didn’t do this, you don’t have the firmware files in /lib/firmware and should have an entry in dmesg about that dmc file missing.
 
Last edited:
I currently have an issue with a ProDesk 400 G5 where it is crashing when NO monitor or even a DisplayPort cable is connected.
I found that when you remove kbl_dmc_ver1_04.bin, this problem goes away.

Proxmox bare metal installer adds the pve-firmware package which includes this dmc firmware.
I don’t know if Proxmox install on Debian also installs pve-firmware by default.

In pure Debian (without Proxmox), you need to add the non free repository and then install the firmware-misc-nonfree package. If you didn’t do this, you don’t have the firmware files in /lib/firmware and should have an entry in dmesg about that dmc file missing.
Interesting. Ok so last night pve2 started randomly rebooting again. Same issues as before it seems.

As mentioned, pve2 is plugged into a monitor, but that monitor was off. I just turned it on, so it's at the "Welcome to the Proxmox Virtual Environment" login screen, Its been there for about an hour now with no reboot so I'll leave it and see how it goes.

/etc/apt/sources.list looks the same on both pve1 and pve2, and /etc/apt/sources.list.d has the same lists on both as well.

I do see pve-firmware installed on both if I do apt list --installed.

Also in /lib/firmware/i915 i see the kbl_dmc_ver1_04.bin file, you said you remove the file and the problem goes away? Is that a stable permanent fix or does it come back or cause other issues if you do that?
 
I tried this, because I had a similar issue with an older server (based on i3-7100 and a Supermicro board), even back then with a 4.x kernel. That one never had a monitor connected and took longer to crash, but it had quite a lot of services running.
Whenever I watched something on Plex, it froze and I had to pull power to get it starting again. Then I renamed the firmware to xxx.disabled instead of removing it, and it never froze again.
A pure Debian always gives a warning about this file missing but it does not cause any issues, it is for power saving of the iGPU.
 
To investigate the issue on my machines further, I installed Arch with the latest kernel (6.4.10) on my HP Prodesk 400 G5 (which also ships with intel-firmware package, including kbl_dmc_ver1_04.bin). It runs fine without a monitor and did not crash yet.
But I noticed that power consumption is higher, with about 5W (measured at the outlet). The Proxmox 8.0 install only used 2.5W when no monitor was connected. So some power saving setting in Proxmox bare metal install is freezing the system when combined with kbl_dmc_ver1_04.bin.
Using powertop, I also noticed that it only went down to Package C-State C3, not C8 or even C10 as on Proxmox.

I have now installed Ubuntu 23.04 (same 6.2 kernel), which also installs and loads kbl_dmc_ver1_04.bin.
With a monitor connected, it goes down to C8 (3.7W at the outlet), when I unplug the DisplayPort cable, it goes down to C10 (2.3W).
Right now it is idling without a monitor and I will see if it crashes.

-- Edit --
Ubuntu has been fine, no crash in 5 hours.
So I did not change anything on the HP itself, installed Proxmox 8.0-2 from ISO, unplugged the monitor, waited 3 minutes and it crashed.
 
Last edited:
  • Like
Reactions: Moonrise6220
You're on to something with the monitor connected, I haven't had a crash since I turned on the monitor that it's been plugged into. I'm going to try and rename that kbl_dmc_ver1_04.bin file and see if that keeps it from crashing while the monito is not connected.

Still not able to get HW transcoding to work, I may go back to a bare-metal install of proxmox since I was able to get that to work. Curious if you think using a display emulator would fix/improve stability without having to remove that firmware file? something like this:
https://www.amazon.com/FUERAN-DP-DisplayPort-Emulator-2560x1600/dp/B07D5G96D8
 
  • Like
Reactions: Moonrise6220
It appears renaming the /lib/firmware/i915/kbl_dmc_ver1_04.bin has stopped the random crashing/reboots. The device has been running now for about 18 hours with no crash. Next i'm going to try reloading proxmox from the bare-metal installer and testing the fix to see if it remains stable.

Still would be curious to test the display emulator but if this has no repercussions then it's not really worth it.
 
It really is not an issue not loading this firmware file, every vanilla Debian does it because they don't include the non free firmware package by default. The only difference is that it cannot send the iGPU to a lower power state, so maybe 2-3W higher consumption in idle.
 
So re-loaded proxmox with the bare-metal installer and renamed the firmware file, haven't had a crash yet. Monitor is fully disconnected, been running since last night. I was also able to get hardware transcoding working in plex (lxc container), i never was able to get it to work with the debian install. Probably some driver or something.
 
I have setup all three of my HP ProDesk 400 G5 now, only one of them shows this behavior.
They are connected to a PiKVM with a 4-port HDMI switch, I had the switch set to PC1 and everything has been running fine for 3 days.
Then I switched to PC4 (where no machine is connected to, so all three have no active monitor and no usb/mouse connected), and PC1 immediately started rebooting every 3 minutes.

Does anyone have some tips to get the reason for these reboots? Maybe I can then open a bug with Proxmox or upstream i915 driver.
 
Hmmm facing the same issue on my Nuc I8...
I've had a monitor connected it over a few weeks. Mostly with the monitor off.
This was to monitor the logs why my Proxmox server crashed because the journal files didn't say much about the crash.

Hope you guys find a good fix for this :(
 
Hmmm facing the same issue on my Nuc I8...
I've had a monitor connected it over a few weeks. Mostly with the monitor off.
This was to monitor the logs why my Proxmox server crashed because the journal files didn't say much about the crash.

Hope you guys find a good fix for this :(
Did you try renaming or deleting /lib/firmware/i915/kbl_dmc_ver1_04.bin? That has resolved my issue, been running stable now for a few weeks.
 
  • Like
Reactions: djmleipzig
I tried Debian 12 with the firmware-misc-nonfree installed and it is the same issue.
For Debian I found a guide how to get the kernel crash, but it produces no kdump at all, so that doesn’t help either.

I opened a bug with Debian and then tried Fedora 38 which ships the DMC firmware, it also crashed.
Afterwards I investigated further and thought about checking upstream with the i915 developers.
I compiled the latest drm-tip which is on kernel 6.5.0, it also crashes.

So I opened an issue over there:

https://gitlab.freedesktop.org/drm/intel/-/issues/9244

Just wanted to let you guys know, I will try the kernel parameters mentioned in the replies later and give feedback.
 
Gotta love these video card issues. I wonder if plugging in a dummy load into the video port would fix it till the driver is fixed? Those are cheap on Amazon.

The dummy load would simulate the monitor being on.
 
Gotta love these video card issues. I wonder if plugging in a dummy load into the video port would fix it till the driver is fixed? Those are cheap on Amazon.

The dummy load would simulate the monitor being on.
Yeah i considered buying one off amazon, they're like $6. but renaming that firmware file fixes the issue for me so it didn't really seem worth it to me since it doesn't cause any other issues as far as I can tell.
 
Did you try renaming or deleting /lib/firmware/i915/kbl_dmc_ver1_04.bin? That has resolved my issue, been running stable now for a few weeks.
Thank you for this, I have spent a few days trying to work out what mistake I made with the Installation, this solution worked for me straight away.
HP ProDesk 600 G4 Mini PC Intel Core i5-8500T
 
Did you try renaming or deleting /lib/firmware/i915/kbl_dmc_ver1_04.bin? That has resolved my issue, been running stable now for a few weeks.
Thx man ! this "fix" has already been running a day !
Do you know what that driver was intended for?
Is it ment for physical monitor attachment?
 
Thx man ! this "fix" has already been running a day !
Do you know what that driver was intended for?
Is it ment for physical monitor attachment?
I dont' really know what its meant for. Perhaps somekind of power saving feature that happens because a monitor is detached? It's really strange because 2 out of 3 identical devices had this issue. hanzoh seems to know more about it than I, he's opened a bug with the developer team
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!