Proxmox crashing

webservio

Renowned Member
May 13, 2009
106
1
83
A couple of proxmox machines are showing the same symptoms. They both crash and get hung at random times and I believe what I see is the common cause is KVM. I upgraded both machines and below is a version report of one. I am puzzled and not sure how to diagnose this issue.


pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
 
When the systems crash they display a whole lot f information on the last screen. Will I be able to retrieve these messages from the system log? If so which log?

Thanks
 
Dietmar,

I did a little bit of digging and found the following entry in syslog:

Nov 1 05:55:22 vpshost32 ntpd[2413]: kernel time sync status change 0001
Nov 1 06:00:01 vpshost32 /USR/SBIN/CRON[9062]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
Nov 1 06:00:01 vpshost32 /USR/SBIN/CRON[9063]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot)
Nov 1 06:00:01 vpshost32 /USR/SBIN/CRON[9064]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean)
Nov 1 06:00:01 vpshost32 /USR/SBIN/CRON[9065]: (root) CMD (/usr/local/bin/rotate-vzdump-nas2.sh)
Nov 1 06:00:08 vpshost32 kernel: svc: failed to register lockdv1 RPC service (errno 97).
Nov 1 06:01:58 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:02:00 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 06:02:14 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:02:14 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 06:03:13 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:03:13 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:03:13 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:03:15 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 06:03:15 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 06:03:15 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 06:03:24 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:03:24 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:03:24 vpshost32 kernel: ct0 nfs: server 192.168.90.63 not responding, still trying
Nov 1 06:03:27 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 06:03:27 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 06:03:27 vpshost32 kernel: ct0 nfs: server 192.168.90.63 OK
Nov 1 07:01:04 vpshost32 kernel: imklog 3.18.6, log source = /proc/kmsg started.
 
Hi Dietmar,

So sorry if this is a repeat / stupid question:
Is there a documentation on how to install the newest Kernel once I download it? Also is there a way to revert back?

Regards,

Mishi
 
This looks like the nfs storage node dies. For me, that causes the PVE web interface to become inaccessible, but I can access the VM host via ssh. Maybe that's what you experience?
 
Jejader,

I suspected the reason was NFS. However in my case I could not even ping or SSH to the server.