[SOLVED] proxmox crashes after about ~18 hours

MrGSee

New Member
Feb 18, 2021
6
3
3
43
Hi

I have been struggling with this for weeks now, my Proxmox host crashes usually about 18 to 24 hours after a start, sometimes sooner.

I am not finding anything specific in kern or messages logs, at least as far as I can tell by googling the errors.
The most likely is a line that says this (but not convinced its the problem):
Mar 12 07:30:37 proxmox kernel: [137561.874336] fwbr107i0: port 2(tap107i0) entered forwarding state
The above is the last line that appears a few times before a crash but sometimes that is hours before the server restarted.

I had added a SAS HBA controller which works well, originally thought it was the cause but have removed it and the system crashed again.

I have run two cycles of memtest86+ with no errors.

No errors from SMART checks

I really don't know where to go from here?

Hope someone can help sort this issue out.

System details:
Mobo: Gigabyte Z370M D3H
Mem: Corsair Vengeance LPX 64 GB (4 x 16 GB) DDR4 2666 MHz
CPU: Intel Core i7-8700 CPU @ 3.20GHz
Boot drive: M Series NVMe SSD 128G
2 SSDs for virtual machine boot disks
2 WD Red for data etc

VMs - the main VMs I run are
Windows for BlueIris
Ubuntu with Docker hosting a number of docker containers
Ubuntu with ngnix
Few other that don't run all the time

What are the next steps here, where else can I look, what else can I test?

Thanks
G
 
Last edited:
Server just crashed again a couple hours after starting.
Went into a continues cycle of restarts every couple of seconds.

Has done this on a few occastions before, even after removing power it would do this unless I leave the power unplugged for a while.
 
18 hours sounds like software issue or slow heat problem, reboot loop sound like a hardware problem. Motherboard or power supply or BIOS settings corrupt (clear CMOS?)? Maybe disconnect everything and reconnect to make sure you did not create a bad contact (half unplugged something) when working on it?
 
Thanks for the reply

BIOS is the latest.

I put a new power supply in today, really hope that helps but I am not holding my breath.

I recently refitting ram, not much else on the board other than CPU but will recheck everything when / if it happens again.

CPU temps are OK.

Will try clear CMOS if / when it happens again.
 
Last edited:
Its look like a power supply problem.

I swapped the power supply with another machine and Proxmox has been running fine since.

The other computer, a Windows gaming machine, power cuts not long after starting with the power supply taken from the Proxmox server so fairly evident that was the cause.

I will leave Proxmox running for a couple more days to confirm things are stable and report back here.
 
  • Like
Reactions: leesteken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!