Host randomly crashing

MrSoupman

Active Member
Aug 31, 2019
3
2
43
31
Hi there, I seem to have been having issues with my setup since the beginning. All I know is that my server can run stable for at most 1-2 weeks, until it eventually just hard crashes. If I hook my monitor up (while my GPU is plugged in) there is no output on the screen. Going into my router, the host and all of my VMs/CTs do not show up. Sometimes it crashes after a day or two. This has been happening with Proxmox 5.4, and 6.0.

System Specifications:
  • CPU: AMD Ryzen 7 1700
  • Motherboard: ASRock B450m pro4
  • Memory: 4x 8GB DDR4-3000 RAM
  • GPU: GeForce GTX 650 (Removed)
  • Storage:
    • 1x 250gb SSD
    • 1x 10tb HDD
  • pveversion -v output:
    [*]proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
    pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
    pve-kernel-5.0: 6.0-5
    pve-kernel-helper: 6.0-5
    pve-kernel-5.0.15-1-pve: 5.0.15-1
    ceph-fuse: 12.2.11+dfsg1-2.1
    corosync: 3.0.2-pve2
    criu: 3.11-3
    glusterfs-client: 5.5-3
    ksm-control-daemon: 1.3-1
    libjs-extjs: 6.0.1-10
    libknet1: 1.10-pve1
    libpve-access-control: 6.0-2
    libpve-apiclient-perl: 3.0-2
    libpve-common-perl: 6.0-2
    libpve-guest-common-perl: 3.0-1
    libpve-http-server-perl: 3.0-2
    libpve-storage-perl: 6.0-5
    libqb0: 1.0.5-1
    lvm2: 2.03.02-pve3
    lxc-pve: 3.1.0-61
    lxcfs: 3.0.3-pve60
    novnc-pve: 1.0.0-60
    proxmox-mini-journalreader: 1.1-1
    proxmox-widget-toolkit: 2.0-5
    pve-cluster: 6.0-4
    pve-container: 3.0-3
    pve-docs: 6.0-4
    pve-edk2-firmware: 2.20190614-1
    pve-firewall: 4.0-5
    pve-firmware: 3.0-2
    pve-ha-manager: 3.0-2
    pve-i18n: 2.0-2
    pve-qemu-kvm: 4.0.0-3
    pve-xtermjs: 3.13.2-1
    qemu-server: 6.0-5
    smartmontools: 7.0-pve2
    spiceterm: 3.1-1
    vncterm: 1.6-1
    zfsutils-linux: 0.8.1-pve1


    [*]

What I've tried so far:
  • In any case, checking the syslog or the kernel log shows the ASCII characters for null (^@^@^@...) when it crashes, not giving me any meaningful explanation.
    • The other logs don't even mention the crash.
  • I've tested all four modules of my RAM, all in different spots using memtest86+ for a full day. Memtest reported no errors.
  • I've blacklisted all nouveau drivers as I have already taken out my graphics card anyway. (This did actually help make the system more stable than it used to be, but it still crashes often.)
  • I've checked for any settings in BIOS that could be causing any power issues and have disabled them.
  • I've tried with the RAM at the BIOS default speeds at 2133 Mhz
  • I've reset the BIOS settings
  • Updated BIOS version to 2.00
  • A full reinstall of proxmox
All of this to no avail. I do admit that I am quite new to working with Proxmox and Linux as a whole. I've tried searching around and did find a thread with similar specs and problems as me but no real answers there. Some advice would be quite useful. I'd even appreciate something I could use to make the server automatically restart when this happens (but I'm not sure that exists for a crash like this)
 
A shot in the dark: Have you tried disabling the C6 power state? E.g. run this script:

Code:
./zenstates.py --c6-disable

(or check if your BIOS has an option for this) and see if the system still crashes. This is a known bug that I've personally encountered, and it manifested itself exactly like you describe.
 
I haven't tried doing that because I didn't notice any of my logs reporting a core hangup, so I didn't think it was related. Although, at this point I'm kind of desperate. I'll try it out and see if the server remains stable. Thank you!
 
A shot in the dark: Have you tried disabling the C6 power state? E.g. run this script:

Code:
./zenstates.py --c6-disable

(or check if your BIOS has an option for this) and see if the system still crashes. This is a known bug that I've personally encountered, and it manifested itself exactly like you describe.
Alright, its been a little over a month now, but this is by far the most stable i've seen the system (hopefully I didn't just jinx myself...)! No hard crashes in the past month, I think this might be the winner. Thank you so much!!
 
  • Like
Reactions: Stefan_R and stark

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!