Freezes with Gigabyte GB-BRR7H-4800 (rev. 1.0)

I also bought a Gigabyte GB-BRR7H-4800 during lockdown to have one more thing to tinker with and tinkering I received :rolleyes::). Proxmox, Ubuntu with different versions and 64GB kept freezing during file transfers and virtualization.

The fan was also very loud so I upgraded to BIOS F09 and set the fans to silent. Since they were still audible I ended up buying this Case (https://www.akasa.co.uk/update.php?...type_sub=Fanless AMD MiniPC&model=A-NUC76-M1B) which got completely rid of noise. Thermals are also much better now 10-20 degrees less under moderate load).

I just had another freeze while using rsync. I disabled "NX-Mode" in BIOS and running Debian 11 for now. It looks like that solved the rsync problem.

Would be interested in hearing if that also works for others.
 
Hi there,
@whitehatmiddleman: would you mind sharing pics of your mod (heatsink on vrm)?
not really clear to me which parts you are cooling, but as i am experiencing freezes as well, I would be interested to apply the mod. Is your BRIX still running stable?
@all: still wondering if this is related to proxmox or the hardware… did anyone try and verify if the freezes do not happen e.g. with windows as OS?

TIA
Hello @CaptainDork , I can tell you it is not a proxmox issue. I have two of these units and this seems to be based on bad cpu binning. Between the two units I noticed that a core when overloaded will cause the system to crash weather in windows and/or linux.

I ran multiple stress tests to determine which were faulty cores and decided these units are not stable for lab stuff. I'm now using these units as a media server, but I have a startup script that set the ProcessorAffinity to ignore the bad core/cores.

BTW the cooling mods made no difference.
 
I just had another freeze while using rsync. I disabled "NX-Mode" in BIOS and running Debian 11 for now. It looks like that solved the rsync problem.
@wolke Might be on to something. When I disabled NX-Mode along with SVM the system became more stable. My original intention for these units were to run a small virtualized lab, it seems like it wasn't built for that.
 
@wolke Might be on to something. When I disabled NX-Mode along with SVM the system became more stable. My original intention for these units were to run a small virtualized lab, it seems like it wasn't built for that.
Crashes don't happen with windows, I've been using Windows server for months and the system is stable. Still, I hate windows.
Even stressing the CPU with windows is completely stable.
 
Hi!

My proxmox kernel crashed within 15 minutes when using virtualization an without maximum 2 hours. Had this problem now for nearly one year and finally found a working solution for me.

Sometimes I saw on the kernel crash screen, that CPU:12 crashed.
IMG_20220908_081745.jpg
After testing a lot of things, getting more and more frustrated, I found last week in another forum the possibility to disable single CPUs.
So I simply deactivated this CPU:
Bash:
echo 0 > /sys/devices/system/cpu12/online
and automated this with sysfsutils, so that this CPU gets disabled every boot.

Proxmox runs now for 7 days without any problems.

Regards

Marco
 
For future folks that come across this post with the same issue but when none of your CPU cores crashed like @mgoeben, a working solution for me was to put all CPU core performance to power-save mode.

Reference howto
 
Hi everyone,

got a bit nervous with these beasts. I run a cluster trio of Proxmoxies. 2x GB-BRR7-4800 (slim version) and 1x GB-BRR7H-4800.
Of course it's funny to see 48 Cores in the cluster Summary.
Of course it crashes sometimes.
Of course I will need to change the fan (did not have time to check this yet, maybe noodling with a Noctua Fan 120mm 5V USB with a dust protection grill - see picture).

A quick fix for this when your server crashes.

1.Restart the Proxmox server and go to shell:

> shell-root# dmesg | grep clocksource

2. Check the information provided like:
clocksource: timekeeping watchdog on CPUXXXX (example CPU1 in the picture): Marking clocksource 'tsc-early' as unstable because the skew is too large

3. Now let's take an example: If the CPU shown faulty is CPU3 for example, then you can disable it:

> shell-root# echo 0 > /sys/devices/system/cpu/cpu3/online

I did this 2 weeks ago and I still have no freeze at all. Pay attention as a while after you disable the CPU you get 15 CPU threads shown in the Cluster summary.

Hope this helps.
 
For future folks that come across this post with the same issue but when none of your CPU cores crashed like @mgoeben, a working solution for me was to put all CPU core performance to power-save mode.

Reference howto
I have tried this solution and it works for weeks without any problem! You are a boss!



I use SystemD to set power-saving mode after reboot:


Code:
[Unit]
Description=Set Power Saving Mode to avoid crash in this bullshit system

[Service]
ExecStart=/root/setPowersavingMode.sh

[Install]
WantedBy=multi-user.target

And the /root/setPowersavingMode.sh:

Bash:
#!/bin/bash

echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
 
Hi guys! I am writing again in this thread to inform all readers that tonight the machine burned down. Yes, with smoke and a burning smell.

I recommend to everyone who is using this equipment that you sell it before it breaks. In my case I am processing the warranty since it has broken in less than two years and in Spain the warranty is two years.

This is without a doubt the worst design I have ever seen, and probably the worst brand. I have had unbranded Chinese equipment running for more than 5 years, and this one, which is from a well-known brand, died in a year... Even having it in low consumption and low performance mode...

IMG_0268.jpg
 
Hi guys! I am writing again in this thread to inform all readers that tonight the machine burned down. Yes, with smoke and a burning smell.

I recommend to everyone who is using this equipment that you sell it before it breaks. In my case I am processing the warranty since it has broken in less than two years and in Spain the warranty is two years.

This is without a doubt the worst design I have ever seen, and probably the worst brand. I have had unbranded Chinese equipment running for more than 5 years, and this one, which is from a well-known brand, died in a year... Even having it in low consumption and low performance mode...
Thanks for the heads up. I've only had to use this device as a sandbox and only power this on when needed. Even the WOL can be inconsistent if the devices is not on a UPS.

I've been working with many mini AMD pc and I feel these devices are not meant for small lab or inexpensive remote branch office use. I still recommend Intel NUCs or mini pc's with intel CPU and ethernet chipsets due to their mature engineering.
 
Hi there,
@whitehatmiddleman: would you mind sharing pics of your mod (heatsink on vrm)?
not really clear to me which parts you are cooling, but as i am experiencing freezes as well, I would be interested to apply the mod. Is your BRIX still running stable?
@all: still wondering if this is related to proxmox or the hardware… did anyone try and verify if the freezes do not happen e.g. with windows as OS?

TIA


Hi CaptainDork & all,

Same here (not Proxmox but running Debian&KVM/QEMU). Random crash/freeze and system would only become available again after hard shutdown/reboot.

I noticed that the freeze never, really never, happens when I have boost disabled.
Not sure how you do such in Proxmox, but on my Debian I just issue the command :
sudo bash -c "echo 0 > /sys/devices/system/cpu/cpufreq/boost"

If I re-enable the boost (sudo bash -c "echo 1 > /sys/devices/system/cpu/cpufreq/boost") then it doesn't take long for another freeze to happen.

Also tried different power governers - but that didn't make any difference (except for those that disable boost and keep CPU at lowest clockspeed possible). Currently using default "schedutil" and CPU boost disabled. No freezes at all.

Best regards, Patrick
 
Last edited:
I applied new thermal paste in the hope the core temps are lower... it didn't make any much difference.

But when i checked the temps, i saw that my CPUTIN value is constantly 125,5 celcius. How much is this value with yours?
 
Same issues on Ryzen 9 7950X.. I could solved by not using 'host' as CPU type for the VMs. Unfortunately tried so many things which didn’t help..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!