[SOLVED] proxmox crashes after about ~18 hours

MrGSee

New Member
Feb 18, 2021
6
3
3
44
Hi

I have been struggling with this for weeks now, my Proxmox host crashes usually about 18 to 24 hours after a start, sometimes sooner.

I am not finding anything specific in kern or messages logs, at least as far as I can tell by googling the errors.
The most likely is a line that says this (but not convinced its the problem):
Mar 12 07:30:37 proxmox kernel: [137561.874336] fwbr107i0: port 2(tap107i0) entered forwarding state
The above is the last line that appears a few times before a crash but sometimes that is hours before the server restarted.

I had added a SAS HBA controller which works well, originally thought it was the cause but have removed it and the system crashed again.

I have run two cycles of memtest86+ with no errors.

No errors from SMART checks

I really don't know where to go from here?

Hope someone can help sort this issue out.

System details:
Mobo: Gigabyte Z370M D3H
Mem: Corsair Vengeance LPX 64 GB (4 x 16 GB) DDR4 2666 MHz
CPU: Intel Core i7-8700 CPU @ 3.20GHz
Boot drive: M Series NVMe SSD 128G
2 SSDs for virtual machine boot disks
2 WD Red for data etc

VMs - the main VMs I run are
Windows for BlueIris
Ubuntu with Docker hosting a number of docker containers
Ubuntu with ngnix
Few other that don't run all the time

What are the next steps here, where else can I look, what else can I test?

Thanks
G
 
Last edited:
Server just crashed again a couple hours after starting.
Went into a continues cycle of restarts every couple of seconds.

Has done this on a few occastions before, even after removing power it would do this unless I leave the power unplugged for a while.
 
18 hours sounds like software issue or slow heat problem, reboot loop sound like a hardware problem. Motherboard or power supply or BIOS settings corrupt (clear CMOS?)? Maybe disconnect everything and reconnect to make sure you did not create a bad contact (half unplugged something) when working on it?
 
Thanks for the reply

BIOS is the latest.

I put a new power supply in today, really hope that helps but I am not holding my breath.

I recently refitting ram, not much else on the board other than CPU but will recheck everything when / if it happens again.

CPU temps are OK.

Will try clear CMOS if / when it happens again.
 
Last edited:
Its look like a power supply problem.

I swapped the power supply with another machine and Proxmox has been running fine since.

The other computer, a Windows gaming machine, power cuts not long after starting with the power supply taken from the Proxmox server so fairly evident that was the cause.

I will leave Proxmox running for a couple more days to confirm things are stable and report back here.
 
  • Like
Reactions: leesteken