Server Proxmox freezes randomly

jp22

New Member
Aug 17, 2023
1
0
1
Good afternoon, I'm having issues with a Proxmox server that randomly freezes and requires manual reboot. Any ideas? Thanks


Proxmox Version 8.0.3
PowerSupply = new
Clean install with new disk ssd wd green
Only one VM winserver2019
MotherBoard: GA-B250M-D3H

*Dmesg Attached

Before hung journctl says


16 19:17:01 prox1 CRON[945589]: pam_unix(cron:session): session closed for user root
Aug 16 19:25:00 prox1 smartd[1580]: Device: /dev/sda [SAT], is back in ACTIVE or IDLE mode, resuming checks (1 check skipped)
Aug 16 19:25:00 prox1 smartd[1580]: Device: /dev/sdc [SAT], is back in ACTIVE or IDLE mode, resuming checks (4 checks skipped)
Aug 16 19:55:05 prox1 smartd[1580]: Device: /dev/sda [SAT], is in STANDBY mode, suspending checks
Aug 16 19:55:11 prox1 smartd[1580]: Device: /dev/sdc [SAT], is in STANDBY mode, suspending checks

"At 20:00 PM, it stopped working, and a physical server reboot was required. no ssh, no gui, nothing only physical reboot"


Config vm:

perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
agent: 1
boot: order=scsi0;net0
cores: 6
cpu: x86-64-v2-AES
machine: pc-i440fx-8.0
memory: 14000
meta: creation-qemu=8.0.2,ctime=1691876077
name: WINSRV2019
net0: virtio=5E:6A:2F:14:71:2F,bridge=vmbr0
numa: 0
onboot: 1
ostype: win10
scsi0: VolumeZFS:vm-100-disk-0,cache=writeback,discard=on,iothread=1,size=150G
scsi1: VolumeZFS:vm-100-disk-1,cache=writeback,discard=on,iothread=1,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=eeb479e4-7bef-4dcc-ab5a-e87f7932692a
sockets: 1
usb0: host=4971:1020,usb3=1
vmgenid: 0a4dc3a9-4654-433e-bfd4-18c2e2f87c7f
 

Attachments

Last edited:
Hmmm, I don't see anything particularly useful in the logs. Kernel seems up-to-date. Have you checked for any available BIOS upgrades?
Could you also send me the output of:
pveversion -v
 
I do have the same problem so it seems.

Last week I completed in place upgrade from 7 to 8 using the instruction here.

Attached syslog and output of:
last -x

and output of
pveversion -v

After upgrade I had issue with system time which I managed to correct.
Then I had issue with Martian source in Frigate CT which vanished after few days.
Since freezes continued and after I red this post, I checked memory with Memtest but found nothing.
Then I suspected OpnSense which run on one of my VM and disable it and start using standalone DHCP, but again, the freezes continued.
Finally I removed the Ethernet card which I installed just before upgrade, guessing it might cause some PCI compatibility issues. Again the freezes continued.

I am posting to get some more ideas what to try next.
I am thinking about downgrading back to Proxmox 7 but can't find instructions how to do that.
Also thinking about clean install but not sure how to do that since I have TrueNAS in one of the VM with it's own ZFS volume (Pass Through Disks).

After reading this post I updated the BIOS which was outdated. Will report if the freezes stop.
 

Attachments

Last edited:
any Idea why the drives are going into standby, I see nothing like that in my logs?
 
3 days after BIOS upgrade the system freeze again. Disabled KSM right now. Will report.
 
Had today two events of freeze after disabling KSM. Any more ideas?
I am running now the PVE with all VM and CT down to test if freeze repeats.
 
It had to do with Frigate and MQTT on the same PVE. I found a thread that speak about that. So I removed Frigate to another node and that did it.
 
Last edited:
It had to do with Frigate and MQTT on the same PVE. I found a thread that speak about that. So I removed Frigate to another node and that did it.

So you just created a new PVE and put frigate in that one? And then no more Proxmox freezes?

I have the same issue, witin 12-24hours Proxmox becomes unresponsive and only thing that Works is to cut the power. Nothing gets written to the logs.

now after I found this, i have tried to disable MQTT in frigate config, to see if that helps.

I have Proxmox with a haos vm with mqtt, frigate, zigbee2mqtt, wmbusmeter + a lot more

Before I added frigate it has been running for months.

I started with frigate in a lxc, but its the same as frigate as a haos addon - within 12-24 hours it gets unresponsive
 
So you just created a new PVE and put frigate in that one? And then no more Proxmox freezes?

I have the same issue, witin 12-24hours Proxmox becomes unresponsive and only thing that Works is to cut the power. Nothing gets written to the logs.

now after I found this, i have tried to disable MQTT in frigate config, to see if that helps.

I have Proxmox with a haos vm with mqtt, frigate, zigbee2mqtt, wmbusmeter + a lot more

Before I added frigate it has been running for months.

I started with frigate in a lxc, but its the same as frigate as a haos addon - within 12-24 hours it gets unresponsive

I encounter the same thing! I have a Frigate LXC that indeed connects with a MQTT server, but I have that MQTT server runningin an LXC on another Proxmox server. So the combination of MQTT server LXC and Frigate LXC seems not to be the problem.

I have passed thru a USB port with a Coral TPU device to the Frigate LXC. Do you have that? If so, It might be the cause.
 
I encounter the same thing! I have a Frigate LXC that indeed connects with a MQTT server, but I have that MQTT server runningin an LXC on another Proxmox server. So the combination of MQTT server LXC and Frigate LXC seems not to be the problem.

I have passed thru a USB port with a Coral TPU device to the Frigate LXC. Do you have that? If so, It might be the cause.
So frigate on one pc and mqtt on another pc still crashes the pc running frigate?

Have you tried with
Mqtt: enabled: false ? - I have that now, but I am not past the 12hour mark where it usually happens:)

I have a m2 coral passed thru, but before that I have tried openvino and the std. Cpu detector, but its the same - it crashes after about 12 hours
 
  • Like
Reactions: Wolk9
Just crashed again... So to disable mqtt in the frigate config doesnt change anything...

I am about to give up
I'm more or less happy that it isn't the MQTT part, as I rely heavily on the Frigate MQTT messages in my HA config.

But nevertheless, it is pretty hopeless that the bug is still there.

I tried to make an automation that pings to the very proxmox server and once the ping stopped, a smart switch switched my server off and on. But this hardware has a soft on/off button, so once the machine has power again, you need to start it with manually holding the microswitch on the front. So so far my automation solution. Bummer.
Restarting the server manually has a caviat on itsself as the USB device numbering is not consistent so that it might be the case that I need to alter the passthru port in the lxc config file first. Haven't figured out yet how to automate that on itself in Proxmox, what from a theoretical standing point should be possible...

Sigh.... I think we will end up with the most reliable solution: hanging ourselfs on the place of the camera to see things happening for our selfs live.

The development of Frigate itself seems to stall as well, lately. Hope the new release wil be issued soon.
 
Last edited:
Check your ram. Reseat it. Make sure its the right speed. Choose memtest off the menu and let it run for a few hours. Check your temps, hot nvme drives will crash you.
 
I also run Frigate in LXC and get this freezes/crashs every week. Sometimes days or hours. Could it be related to NIC high activities (cameras)? Tried disabled checksum off load on nic, disabled all c states related features on motherboard BIOS, hdmi dummy plug, unfortunately nothing works. 2 lxc and 1 VM. Mostly crashs over night, 2, 3AM.