Server randomly crashing

YeapGuy

New Member
Dec 23, 2020
4
0
1
34
Hello,
I recently installed Proxmox VE on a Hetzner server according to their tutorial. Now, after setting up everything and running just one Windows VM, I'm experiencing random hard crashes (faillog and lastlog full of NUL (^@) characters). I thought this might be a hardware fault, so I've let them know and they ran extensive hardware tests on the server. No issues found and they said server didn't crash a single time during their 12 hour test. Meanwhile I am sometimes struggling to keep it up for 2 minutes.
I've tried installing kdump-tools, as instructed on the forums here, but that reports nothing (no date folders in /var/crash).

Does anyone have an idea what could this be and how to fix it?

Thanks.
 
hi,

faillog and lastlog full of NUL (^@) characters
these are binary format files, not human readable. i'd say it's unrelated.

can you take a look at /var/log/syslog, journalctl and dmesg?
 
hi,


these are binary format files, not human readable. i'd say it's unrelated.

can you take a look at /var/log/syslog, journalctl and dmesg?
Hey, thanks for your swift response.
I've taken a look in the syslog before, and it seemed to me that it just cuts out at the point of crash. But now I've paid more attention to it and it seems like at the time of the supposed crash, nothing happens? Everything continues normally, with Proxmox doing its thing every minute:
Code:
proxmox systemd[1]: Starting Proxmox VE replication runner...
proxmox systemd[1]: pvesr.service: Succeeded.
proxmox systemd[1]: Started Proxmox VE replication runner.
Then when I notice "it's down", a graceful shutdown happens (I reboot the server from Hetzner's panel because I think it crashed) and the server reboots.

This now seems much more like a network issue which looked like the server crashing (that's the disadvantage of having your hardware somewhere in a datacenter and not in front of you :confused:).

Thanks for pointing me in the right direction oguz, I should have paid more attention to the syslog the first time.
Anyway, is there a good way to debug this sort of issue without physical access to the server?
 
Anyway, is there a good way to debug this sort of issue without physical access to the server?
you could ask hetzner for IPMI access, but it might not be necessary.

if you can post some logs here (see my previous post) it'll be easier to tell what's really going on.
 
you could ask hetzner for IPMI access, but it might not be necessary.

if you can post some logs here (see my previous post) it'll be easier to tell what's really going on.
Hey there, I asked them for a KVM console now because the network sometimes drops even after a minute. There's no useful info (nothing related to networking) at all in the logs. All I have in dmesg is the boot and new USB devices (which is the KVM console). Journalctl is very similar, except the every-minute spam of "Starting Proxmox VE replication runner..." which I mentioned in my last post. Same case for the syslog.
 
because the network sometimes drops even after a minute. There's no useful info (nothing related to networking) at all in the logs.

what happens exactly when the network drops?

* can you ping 1.1.1.1 from PVE?

* /etc/network/interfaces can be useful to see (mask your public IP)

* is firewall enabled?

* check the name of your interface from ip a let's assume enp35s0. can you check dmesg | grep enp35s0
 
what happens exactly when the network drops?

* can you ping 1.1.1.1 from PVE?

* /etc/network/interfaces can be useful to see (mask your public IP)

* is firewall enabled?

* check the name of your interface from ip a let's assume enp35s0. can you check dmesg | grep enp35s0
  1. Nope, can't ping anything
  2. Sure. I use a bridge config:
    Code:
    ### Hetzner Online GmbH installimage
    source /etc/network/interfaces.d/*
    
    auto lo
    iface lo inet loopback
    iface lo inet6 loopback
    
    auto enp2s0
    iface enp2s0 inet static
    address <host public IP>
    netmask 255.255.255.224
    gateway <gateway>
    #not sure what the below line is for, but it was there by Hetzner
    up route add -net <different IP, on the same subnet though> netmask 255.255.255.224 gw <gateway> dev enp2s0
    
    iface enp2s0 inet6 static
    address <ipv6>
    netmask 64
    gateway <gateway>
    
    auto vmbr0
    iface vmbr0 inet static
    address <host public IP>
    netmask 255.255.255.255
    bridge_ports enp2s0
    bridge_stp off
    bridge_fd 1
    bridge_hello 2
    bridge_maxage 12
    #repeat the below 5 times for each additional IP
      up ip route add <additional IP>/32 dev vmbr0
  3. "pve-firewall status" returns "Status: disabled/running" which confuses me a bit, but okay, I guess it's off
  4. Nothing interesting, only messages from boot AND from restarting networking (which I tried just now) with ifdown and ifup. Those two returned some cryptic results but I now can't get them because I cannot scroll up many lines in the KVM console. Now if I do "ifdown enp2s0" I get "interface enp2s0 not configured" and "ifup enp2s0" I get "RTNETLINK answers: File exists, ifup: failed to bring up enp2s0"
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!