I just encountered a system crash. How to find the extract reason of it?

prolab

New Member
Aug 11, 2020
21
3
3
I just encountered a system crash a while ago. It's the first time I encounter a crash when I use proxmox ve. It's running as a headless server and it suddenly get not responding and then I plug in a keyboard and monitor, the keyboard's light is not light on, and the monitor is not showing anything. So I have to press the reset button to reboot.

After reboot I checked the /var/log/syslog, this is the log before and after it's crash (the crash happened at around Oct 5 18:00). But what are these ^@^@^@^@^@^@^@^@^@^@? How to find out the extract reason of it's crash?

Code:
Oct  5 17:46:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  5 17:46:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  5 17:47:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  5 17:47:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  5 17:47:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  5 17:48:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  5 17:48:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  5 17:48:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  5 17:49:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  5 17:49:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  5 17:49:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  5 17:50:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  5 17:50:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  5 17:50:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  5 17:51:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  5 17:51:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  5 17:51:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  5 17:52:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  5 17:52:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  5 17:52:01 pve5 systemd[1]: Started Proxmox VE replication runner.
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Oct  5 18:$
Oct  5 18:04:26 pve5 dmeventd[630]: dmeventd ready for processing.
Oct  5 18:04:26 pve5 systemd-modules-load[617]: Inserted module 'ib_iser'
Oct  5 18:04:26 pve5 lvm[630]: Monitoring thin pool pve-data.
Oct  5 18:04:26 pve5 systemd-modules-load[617]: Inserted module 'vhost_net'
Oct  5 18:04:26 pve5 systemd[1]: Starting Flush Journal to Persistent Storage...
Oct  5 18:04:26 pve5 systemd[1]: Started udev Kernel Device Manager.
Oct  5 18:04:26 pve5 systemd[1]: Started Flush Journal to Persistent Storage.
Oct  5 18:04:26 pve5 lvm[616]:   5 logical volume(s) in volume group "pve" monitored
Oct  5 18:04:26 pve5 systemd[1]: Started udev Coldplug all Devices.
Oct  5 18:04:26 pve5 systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Oct  5 18:04:26 pve5 systemd[1]: Starting udev Wait for Complete Device Initialization...
Oct  5 18:04:26 pve5 systemd-udevd[649]: Using default interface naming scheme 'v240'.
Oct  5 18:04:26 pve5 systemd-udevd[652]: Using default interface naming scheme 'v240'.
Oct  5 18:04:26 pve5 systemd-udevd[709]: Using default interface naming scheme 'v240'.
Oct  5 18:04:26 pve5 systemd-udevd[709]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct  5 18:04:26 pve5 systemd-udevd[649]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct  5 18:04:26 pve5 systemd-udevd[652]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct  5 18:04:26 pve5 systemd-udevd[669]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct  5 18:04:26 pve5 systemd[1]: Found device /dev/pve/swap.
Oct  5 18:04:26 pve5 systemd[1]: Reached target Sound Card.
Oct  5 18:04:26 pve5 systemd[1]: Activating swap /dev/pve/swap...
Oct  5 18:04:26 pve5 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Oct  5 18:04:26 pve5 systemd[1]: Starting LVM event activation on device 8:51...
Oct  5 18:04:26 pve5 kernel: [    0.000000] Linux version 5.4.34-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) ()

Also I noticed persistent journald log is not enabled. I have just enabled it using mkdir /var/log/journal following the guide https://forum.proxmox.com/threads/how-to-see-logs-proxmox-crash-alternate.52772/ Also, I want to keep the most detail level of log, what's the journald's default log level?
 
Last edited:
hi,

But what are these ^@^@^@^@^@^@^@^@^@^@?
there's a chance that these are because of the cluster filesystem crashing (pmxcfs)

* is this a standalone server or part of a cluster?

* what does pveversion -v say?

* what action were you performing when the crash happened?

* if you have the journals please check them to see if anything seems abnormal

Also, I want to keep the most detail level of log, what's the journald's default log level?

by default it keeps all levels (from 0 emergency to 7 debug)

if you encounter the crash again you can use journalctl -b <bootno> to get the journals from that boot, i.e. -b 0 will show the current boot and -b 1 will show the last boot. these can be listed with journalctl --list-boots



[0]: https://wiki.archlinux.org/index.php/Systemd/Journal
 
hi,


there's a chance that these are because of the cluster filesystem crashing (pmxcfs)

* is this a standalone server or part of a cluster?

* what does pveversion -v say?

* what action were you performing when the crash happened?

* if you have the journals please check them to see if anything seems abnormal



by default it keeps all levels (from 0 emergency to 7 debug)

if you encounter the crash again you can use journalctl -b <bootno> to get the journals from that boot, i.e. -b 0 will show the current boot and -b 1 will show the last boot. these can be listed with journalctl --list-boots



[0]: https://wiki.archlinux.org/index.php/Systemd/Journal
Thank you.
It's a standalone server not part of a cluster. I have a zpool zfstest5 and inside it I have created a encrypted dataset zfstest5/enc. I added it as a directory in Proxmox VE's webui, and I have to load the key after every boot manually using zfs load-key zfstest5/enc and mount it manually with zfs mount -a. So I set the mkdir to 0 and is_mountpoint to yes following this guide: https://forum.proxmox.com/threads/mount-zfs-when-directory-not-empty.29657/

I also used this script https://github.com/ivanhao/pvetools to install and setup samba, and set /zfstest5 as the samba sharing folder. When the crash happens, I'm browsing the samba share's files from a local Windows computer.

BTW, the Proxmox VE server has a Windows 10 virtual machine up and running with a usb harddisk connected and added to it. I did not enable persistent journald log so it seems there is no journal log for this crash.

The pveversion -v shows:
Code:
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-2~bpo10+1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
The pveversion -v shows:
just to be sure i'll suggest you to update to the latest version: apt update && apt dist-upgrade should do it. if you get any errors about the repositories please see [0]

after doing the updates and setting up the persistent journal be on the lookout for the crash again.

if you see the crash happening again you can use a serial console or equivalent, and also 'kdump' can be useful for getting kernel crash dumps (however this requires you to set it up first)

[0]: https://pve.proxmox.com/wiki/Package_Repositories
[1]: https://wiki.archlinux.org/index.php/Kdump
 
just to be sure i'll suggest you to update to the latest version: apt update && apt dist-upgrade should do it. if you get any errors about the repositories please see [0]

after doing the updates and setting up the persistent journal be on the lookout for the crash again.

if you see the crash happening again you can use a serial console or equivalent, and also 'kdump' can be useful for getting kernel crash dumps (however this requires you to set it up first)

[0]: https://pve.proxmox.com/wiki/Package_Repositories
[1]: https://wiki.archlinux.org/index.php/Kdump

Thank you again. I just come back home today and haven't do the update. I noticed it crashed again..

Here is the /var/log/syslog before and after the crash (it appears the crash happened after Oct 6 17:09:01). This time there was no ^@^@^@^@^ :
Code:
ct  6 17:04:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  6 17:04:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  6 17:04:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  6 17:05:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  6 17:05:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  6 17:05:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  6 17:05:37 pve5 smartd[1312]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 132 to 135
Oct  6 17:05:37 pve5 smartd[1312]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 135 to 138
Oct  6 17:06:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  6 17:06:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  6 17:06:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  6 17:07:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  6 17:07:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  6 17:07:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  6 17:08:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  6 17:08:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  6 17:08:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  6 17:08:15 pve5 pvedaemon[22710]: <root@pam> successful auth for user 'root@pam'
Oct  6 17:09:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct  6 17:09:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct  6 17:09:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct  6 20:27:19 pve5 systemd-modules-load[615]: Inserted module 'iscsi_tcp'
Oct  6 20:27:19 pve5 dmeventd[631]: dmeventd ready for processing.
Oct  6 20:27:19 pve5 systemd-modules-load[615]: Inserted module 'ib_iser'
Oct  6 20:27:19 pve5 systemd-modules-load[615]: Inserted module 'vhost_net'
Oct  6 20:27:19 pve5 systemd[1]: Starting Flush Journal to Persistent Storage...
Oct  6 20:27:19 pve5 systemd[1]: Started udev Coldplug all Devices.
Oct  6 20:27:19 pve5 systemd[1]: Starting udev Wait for Complete Device Initialization...
Oct  6 20:27:19 pve5 systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Oct  6 20:27:19 pve5 lvm[631]: Monitoring thin pool pve-data.
Oct  6 20:27:19 pve5 systemd[1]: Started udev Kernel Device Manager.
Oct  6 20:27:19 pve5 lvm[618]:   5 logical volume(s) in volume group "pve" monitored
Oct  6 20:27:19 pve5 systemd[1]: Started Flush Journal to Persistent Storage.
Oct  6 20:27:19 pve5 systemd-udevd[716]: Using default interface naming scheme 'v240'.

Here is the journalctl -b -1

Code:
Oct 06 16:59:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:00:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:00:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:00:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:01:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:01:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:01:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:02:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:02:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:02:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:03:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:03:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:03:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:04:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:04:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:04:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:05:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:05:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:05:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:05:37 pve5 smartd[1312]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 132 to 135
Oct 06 17:05:37 pve5 smartd[1312]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 135 to 138
Oct 06 17:06:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:06:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:06:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:07:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:07:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:07:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:08:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:08:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:08:01 pve5 systemd[1]: Started Proxmox VE replication runner.
Oct 06 17:08:15 pve5 pvedaemon[22710]: <root@pam> successful auth for user 'root@pam'
Oct 06 17:09:00 pve5 systemd[1]: Starting Proxmox VE replication runner...
Oct 06 17:09:01 pve5 systemd[1]: pvesr.service: Succeeded.
Oct 06 17:09:01 pve5 systemd[1]: Started Proxmox VE replication runner.

Now I have run the apt update && apt dist-upgrade then I noticed a number of packages has been updated to latest. I will try to setup serial console or kdump which seems quite complicated...