Fresh installation, restore and full server freeze over night

GUI? After server reboots it shows on HDMI (if I connect a monitor it it) plain text terminal screen with login prompt. No GUI or anything custom added to the Proxmox host.
I am a little lost here, so maybe let us first clarify. You said above:
When it is working I can see the login and can work over browser in the Proxmox console (even if network is down).
I initially understood this to mean, you have a browser installed on the host.

I now believe, what you meant is that through a browser (on another client) you access the NanoKVM to see the Proxmox CLI? However how do you do that with the network down?

Also as I said above - that kernel-panic screenshot I have NEVER seen (searched the Web too) on a Proxmox host during runtime. Maybe it is from the NanoKVM?
What I mean was that that even if network is down (router upgrades for example), I can still connect remotly over KVM to the server to work in its terminal.
Again, how do you access the NanoKVM without a NW? Maybe you have 2 NWs?


Stock NVMe HDD was ASRock 512GB. I replaced it with WD Blue NVMe 1GB of size.
ASRock 512GB - what is that? ASRock makes MBs (& various other components) - I've never encountered an NVMe of theirs. I agree with Kingneutron, that you need an enterprise-grade disk for Proxmox.

On LXC servers I have I do run apt update
It is alright to run apt upgrade on LXCs (not sure about privileged ones!) - just not on the Proxmox host.

On USB I have my external storage Terramaster D2-320 on USB-C port on my server and then on other USB ports I have APC UPS (to monitor), Google Coral (for Frigate) and Zigbee controller (passing it to Home Assistant VM). None of this devices pull many USB power from server. Maybe only Google Coral on USB3 port.
This can be another point-of-failure. Running that high-speed (3.2?) USB connection on a Mini PC can often crash it. You do have a lot of (active) USB connections for one tiny PC. Even on enterprise servers - USB connected storage devices or definitely not recommended for stability. I would not be surprised if these USB connection(s) is/are crashing the server. The power supply & peripheral chip controller on those Mini PCs are usually flaky as is.

That has nothing to do with GPU HDMI output.
This is not entirely correct. The hardware decoder and the display output ports are components of the same iGPU. VA-API can potentially affect HDMI output because it is part of the same VPP (video processing pipeline). You may also have to consider temps/cooling on that CPU/GPU combo. Come to think of it - that Mini PC in general probably needs temp-checking. At what temp is the NVMe running?

I just have to enable unprivileged container and set nesting to 1
Do you possibly mean "privileged"?

Which VM OS would you suggest?
Why not Debian Stable (server). If you want real lightweight use an Alpine Linux VM - but there is a little more work involved. Start with the current alpine-virt-3.22.2-x86_64.iso - which is a whopping 65MB! (Based on the Frigate docs quotation I have included further - it looks like they prefer Debian).

You must do your own research - as only you know your actual setup requirements. Why not run Frigate inside of Home Assistant? I guess you have probably looked at this option before. (I use HA - but not Frigate).

official Frigate documentation
Well I just skimmed the official documentation & I see:
Frigate runs best with Docker installed on bare metal Debian-based distributions. For ideal performance, Frigate needs low overhead access to underlying hardware for the Coral and GPU devices. Running Frigate in a VM on top of Proxmox, ESXi, Virtualbox, etc. is not recommended though some users have had success with Proxmox.

Then I see:
Proxmox
According to Proxmox documentation it is recommended that you run application containers like Frigate inside a Proxmox QEMU VM. This will give you all the advantages of application containerization, while also providing the benefits that VMs offer, such as strong isolation from the host and the ability to live-migrate, which otherwise isn’t possible with containers.

WARNING
If you choose to run Frigate via LXC in Proxmox the setup can be complex so be prepared to read the Proxmox and LXC documentation, Frigate does not officially support running inside of an LXC.

So I believe I've said enough on this subject.

That config was from official Frigate documentation and other boards
Please don't take this badly, but following all these types of configs/scripts without fully understanding them - is a recipe for disaster. You are not alone here, most (home) Linux users blindly copy/paste till something breaks.

For the attacks from outside?
Not only. Any code/program etc. running in the LXC can manipulate the host in any way. So unless you know exactly what that latest shiny script/upgrade/download is doing, the sky is the limit.

If you are familliar with this custom LXC config
As I've said, I'm not.

just possible security issues?
Dealt with above.

maybe I can try to downgrade Kernel to version that was used in Proxmox v8. Would it even work on Proxmox v9?
IDK. I researched this issue some time ago & couldn't reach a conclusion. I'm sure it will require tinkering & probably be unstable. So unadvised.

Yet again, good luck.
 
I now believe, what you meant is that through a browser (on another client) you access the NanoKVM to see the Proxmox CLI? However how do you do that with the network down?

KVM has ethernet port. I connect via browser to its IP address and it shows me the HDMI output. For example I see this currently:

1762353698904.png

Also as I said above - that kernel-panic screenshot I have NEVER seen (searched the Web too) on a Proxmox host during runtime. Maybe it is from the NanoKVM?


ASRock 512GB - what is that? ASRock makes MBs (& various other components) - I've never encountered an NVMe of theirs. I agree with Kingneutron, that you need an enterprise-grade disk for Proxmox.

Sorry. Not ASRock. I checked it now. Its Asint AS806 512GB. Some China brand that came with the servers. It worked great so far but it showned a 7% weardown, so I decided to replace it.

This can be another point-of-failure. Running that high-speed (3.2?) USB connection on a Mini PC can often crash it. You do have a lot of (active) USB connections for one tiny PC. Even on enterprise servers - USB connected storage devices or definitely not recommended for stability. I would not be surprised if these USB connection(s) is/are crashing the server. The power supply & peripheral chip controller on those Mini PCs are usually flaky as is.

I had the old data storage before connected to same USB port. And data storage has its own power supply. It just only connected over USB for data communications. Speed of transfer is great and serves me well (WD Red). But I replaced this HDD also with new ones (same WD Red model).

This is not entirely correct. The hardware decoder and the display output ports are components of the same iGPU. VA-API can potentially affect HDMI output because it is part of the same VPP (video processing pipeline). You may also have to consider temps/cooling on that CPU/GPU combo. Come to think of it - that Mini PC in general probably needs temp-checking. At what temp is the NVMe running?

Current NVMe temperature is around 40 degrees:

Code:
root@proxmox:~# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            253038803908         WD Blue SN5000 1TB                       0x1          1.00  TB /   1.00  TB    512   B +  0 B   291000WD
root@proxmox:~# nvme smart-log /dev/nvme0n1 | grep temperature | awk '{print ($3-32)*5/9 " °C"}'
40.5556 °C

Why not Debian Stable (server). If you want real lightweight use an Alpine Linux VM - but there is a little more work involved. Start with the current alpine-virt-3.22.2-x86_64.iso - which is a whopping 65MB! (Based on the Frigate docs quotation I have included further - it looks like they prefer Debian).

I have downgraded kernel to 6.8.12-16-pve. Will see if this fixes the stability. For sure I used this one also in PVE 8. I checked dmesg with strings amdgpu in 6.17 and got this:

https://pastebin.com/LSYvYitA

Here on 6.8 I get this:

https://pastebin.com/zV7em4k8

Far less warnings and errors.

You must do your own research - as only you know your actual setup requirements. Why not run Frigate inside of Home Assistant? I guess you have probably looked at this option before. (I use HA - but not Frigate).

I could do that. Run Frigate inside HA, but HA is already slow and takes lots of RAM. Would still need to pass GPU and Google Coral over USB.

Will see if with kernel 6.8 it is more stable for now. Next move will probably be cloning the old NVMe to new one and see if I still get stability issues. Wife is already pissed off since nothing is working in the house when it crashes. I can deal with her for few more days ;).
 
KVM has ethernet port. I connect via browser to its IP address
I asked how you access that IP address if the NW is down. What is up with you? Translation maybe?

Asint AS806 512GB
This "appears" to be a PCIe 3.0 x4 if this site is to be trusted. However your WD Blue SN5000 NVMe is a Gen. 4.
IDK if your MB fully supports PCIe Gen 4, but it is possible that the PC/controller/bus chip cannot keep up with the 16 GT/s rate of transfer & crashes.

And data storage has its own power supply. It just only connected over USB for data communications.
Does not make a difference. It can still potentially crash the bus, especially a high-transfer rate USB. It sometimes depends on what else is going on at that time.

Current NVMe temperature is around 40 degrees
Maybe monitor it & other PC temps - so you can check it at the time of a server crash.

but HA is already slow and takes lots of RAM.
Care to elaborate. I've never noticed that.

I run a Proxmox instance at home on a mini PC (12th gen i7) & gave HA 4 cores & 8GB ram. This is total overkill - I've yet to see it use more than 1G (inside HA), CPU usage is also negligible (~0.5% measured in Proxmox). But I've run it with far less also (& on much older HW). However each setup of HA is different. I have 20 integrations, 7 add-ons & roughly 50 devices. It runs along snappy as ever. In a nutshell; HA should be neither slow nor ram-hungry, if it is - something is wrong with your setup/HW.


Will see if with kernel 6.8 it is more stable for now.
How did it go with the NanoKVM disconnected, did it still crash?

Wife is already pissed off since nothing is working in the house when it crashes.
For this exact reason, in my book, any router, switch etc. that the general NW depends on, should not be virtualized, bare-metal dedicated.

I can deal with her for few more days
Nothing new there!