Proxmox random hangs/crashes

pottsi

Active Member
Jan 18, 2016
11
0
41
34
Hi All,

I've been having issues with my server for months, where by my server becomes unreachable. This can be anywhere between a couple of days to a month. I'm currently stuck on how to go about debugging the issue as no log files are written to and nothing out the ordinary in the last entry before the hang/crash.

I have currently one kvm and 3 containers running. ram usage is around 30% cpu around 3%

Server specs are
Debian 9
6tb Hard drive
32gb ram
Intel Core i7-3770

What i know.

1) Provider has done multiple hardware tests (all passed)
2) Provider has replaced server
3) Provider did another hardware test ( all passed)
4) Nothing is written to log files (have log files going back months)
5) Server becomes unreachable (unknown if it's a hang or loss of network) only way i can access the server is rebooting the server
6) I'm stuck :)

proxmox version
Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve)
pve-manager: 5.3-11 (running version: 5.3-11/d4907f84)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-47
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-39
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-23
pve-cluster: 5.0-33
pve-container: 2.0-35
pve-docs: 5.3-3
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-8
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

Anything else you need, let me know

Thanks
 
Last edited:
- How often the server hang? Every certain of time?
- When Server hang, can you see the Monitor? Usually Monitor display the last error
- In my experience, such error can only fix by replaced with another server. Because h/w test only test a h/w functional not h/w stabilization in certain period.
 
- How often the server hang? Every certain of time?
- When Server hang, can you see the Monitor? Usually Monitor display the last error
- In my experience, such error can only fix by replaced with another server. Because h/w test only test a h/w functional not h/w stabilization in certain period.

How often the server hang?: It's completely random
When Server hang, can you see the Monitor?: i can ask for a kvm session to be attached but would have to reboot my server first so won't see anything on server ( it's a remote server)
such error can only fix by replaced with another server. It's been replaced by the provider already.