Host randomly crashing

MrSoupman

Active Member
Aug 31, 2019
3
2
43
32
Hi there, I seem to have been having issues with my setup since the beginning. All I know is that my server can run stable for at most 1-2 weeks, until it eventually just hard crashes. If I hook my monitor up (while my GPU is plugged in) there is no output on the screen. Going into my router, the host and all of my VMs/CTs do not show up. Sometimes it crashes after a day or two. This has been happening with Proxmox 5.4, and 6.0.

System Specifications:
  • CPU: AMD Ryzen 7 1700
  • Motherboard: ASRock B450m pro4
  • Memory: 4x 8GB DDR4-3000 RAM
  • GPU: GeForce GTX 650 (Removed)
  • Storage:
    • 1x 250gb SSD
    • 1x 10tb HDD
  • pveversion -v output:
    [*]proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
    pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
    pve-kernel-5.0: 6.0-5
    pve-kernel-helper: 6.0-5
    pve-kernel-5.0.15-1-pve: 5.0.15-1
    ceph-fuse: 12.2.11+dfsg1-2.1
    corosync: 3.0.2-pve2
    criu: 3.11-3
    glusterfs-client: 5.5-3
    ksm-control-daemon: 1.3-1
    libjs-extjs: 6.0.1-10
    libknet1: 1.10-pve1
    libpve-access-control: 6.0-2
    libpve-apiclient-perl: 3.0-2
    libpve-common-perl: 6.0-2
    libpve-guest-common-perl: 3.0-1
    libpve-http-server-perl: 3.0-2
    libpve-storage-perl: 6.0-5
    libqb0: 1.0.5-1
    lvm2: 2.03.02-pve3
    lxc-pve: 3.1.0-61
    lxcfs: 3.0.3-pve60
    novnc-pve: 1.0.0-60
    proxmox-mini-journalreader: 1.1-1
    proxmox-widget-toolkit: 2.0-5
    pve-cluster: 6.0-4
    pve-container: 3.0-3
    pve-docs: 6.0-4
    pve-edk2-firmware: 2.20190614-1
    pve-firewall: 4.0-5
    pve-firmware: 3.0-2
    pve-ha-manager: 3.0-2
    pve-i18n: 2.0-2
    pve-qemu-kvm: 4.0.0-3
    pve-xtermjs: 3.13.2-1
    qemu-server: 6.0-5
    smartmontools: 7.0-pve2
    spiceterm: 3.1-1
    vncterm: 1.6-1
    zfsutils-linux: 0.8.1-pve1


    [*]

What I've tried so far:
  • In any case, checking the syslog or the kernel log shows the ASCII characters for null (^@^@^@...) when it crashes, not giving me any meaningful explanation.
    • The other logs don't even mention the crash.
  • I've tested all four modules of my RAM, all in different spots using memtest86+ for a full day. Memtest reported no errors.
  • I've blacklisted all nouveau drivers as I have already taken out my graphics card anyway. (This did actually help make the system more stable than it used to be, but it still crashes often.)
  • I've checked for any settings in BIOS that could be causing any power issues and have disabled them.
  • I've tried with the RAM at the BIOS default speeds at 2133 Mhz
  • I've reset the BIOS settings
  • Updated BIOS version to 2.00
  • A full reinstall of proxmox
All of this to no avail. I do admit that I am quite new to working with Proxmox and Linux as a whole. I've tried searching around and did find a thread with similar specs and problems as me but no real answers there. Some advice would be quite useful. I'd even appreciate something I could use to make the server automatically restart when this happens (but I'm not sure that exists for a crash like this)
 
A shot in the dark: Have you tried disabling the C6 power state? E.g. run this script:

Code:
./zenstates.py --c6-disable

(or check if your BIOS has an option for this) and see if the system still crashes. This is a known bug that I've personally encountered, and it manifested itself exactly like you describe.
 
I haven't tried doing that because I didn't notice any of my logs reporting a core hangup, so I didn't think it was related. Although, at this point I'm kind of desperate. I'll try it out and see if the server remains stable. Thank you!
 
A shot in the dark: Have you tried disabling the C6 power state? E.g. run this script:

Code:
./zenstates.py --c6-disable

(or check if your BIOS has an option for this) and see if the system still crashes. This is a known bug that I've personally encountered, and it manifested itself exactly like you describe.
Alright, its been a little over a month now, but this is by far the most stable i've seen the system (hopefully I didn't just jinx myself...)! No hard crashes in the past month, I think this might be the winner. Thank you so much!!
 
  • Like
Reactions: Stefan_R and stark