[SOLVED] Proxmox keeps crashing

zappa

Member
Oct 31, 2022
63
6
8
My proxmox keeps crashing randomly, and I am completely new to this, so dont even know how to begin to diagnose the problem.
Any help would be appreciated.

Edit: The solution for me was to disable C-states in BIOS.
 
Last edited:
  • Like
Reactions: shane.lawrence
Welcome to the club..!!! The same here...!! with no reason at all...Today I had one LXC Container open and it was closed with no reason at all..!! The only thing I have is a crontab to update my pve and nothing more,,!!!
 
Welcome to the club..!!! The same here...!! with no reason at all...Today I had one LXC Container open and it was closed with no reason at all..!! The only thing I have is a crontab to update my pve and nothing more,,!!!

Yeah for me it can crash without any VM´s at all running, the only way to get it back is to hit the reset button on the proxmox machine itself.
Very strange.
 
  • Like
Reactions: Malvada and Grunchy
Well..., welcome!

This is a forum specifically for Proxmox VE. To get any help you need to supply relevant information about your system, e.g.: what hardware do you use? CPU? Ram? Disks? Which PVE-version is running? Which filesystems are used? How many VMs/Containers are running? Do you over-commit CPU/RAM? What happens? What error message did you get? Is it repeatable after reboot?

And so on, depending on the situation. Yes, unfortunately to know what information is helpful is a challenge in itself!

This post's title ("Proxmox keeps crashing") and the first post does contain neither facts nor a question. What kind of answer could I give?

You may take a look into "/var/log/*" searching for error messages with a timestamp short before the crash. If you have a terminal (in an ssh session) you may run "journalctl -p err -f" continously to watch errors in the moment they occur.


Sorry for being blunt. This is a nice forum with friendly and competent people. As long as you asks PVE related and answerable questions you will probably find help here.

Best regards
 
  • Like
Reactions: Neobin and apoc
Well..., welcome!

This is a forum specifically for Proxmox VE. To get any help you need to supply relevant information about your system, e.g.: what hardware do you use? CPU? Ram? Disks? Which PVE-version is running? Which filesystems are used? How many VMs/Containers are running? Do you over-commit CPU/RAM? What happens? What error message did you get? Is it repeatable after reboot?

And so on, depending on the situation. Yes, unfortunately to know what information is helpful is a challenge in itself!

This post's title ("Proxmox keeps crashing") and the first post does contain neither facts nor a question. What kind of answer could I give?

You may take a look into "/var/log/*" searching for error messages with a timestamp short before the crash. If you have a terminal (in an ssh session) you may run "journalctl -p err -f" continously to watch errors in the moment they occur.


Sorry for being blunt. This is a nice forum with friendly and competent people. As long as you asks PVE related and answerable questions you will probably find help here.

Best regards

Alright, in that case let me start answering some of the questions you asked, and maybe we can figure something out? :)
Proxmox VE, version: 7.2-3
My system is an AMD Ryzen 9, 3950X, 16GB DDR4 Ram, RTX3060 OC Edition with 12GB GDDR6, 256GB Samsung NVMe, 1 TB Kingston SSD.

There are no error messages what so ever, suddenly the screen just goes black, there is no terminal window or anything, and I cant log into the proxmox web environment or anything. The only thing to do at that point is to hit the reset on the machine and start over.

Once it restarts it can work anything from a few hours to a few days and then it crashes again.

I hope thats a start in right direction at least? :)
 
Please provide the complete output of pveversion -v.

Do you see any errors in the syslog after resetting the host? You can find it under `/var/log/syslog`.
 
Please provide the complete output of pveversion -v.

Do you see any errors in the syslog after resetting the host? You can find it under `/var/log/syslog`.
proxmox-ve: 7.2-1 (running kernel: 5.15.30-2-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-helper: 7.2-2
pve-kernel-5.15: 7.2-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-1
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Is there a command so I can get you the syslog? I am not very familiar with linux unfortunately.
 
Is there a command so I can get you the syslog? I am not very familiar with linux unfortunately.
ok, I managed to get this:

Nov 14 16:09:37 proxmox systemd[1]: rsyslog.service: Sent signal SIGHUP to main process 895 (rsyslogd) on client request.
Nov 14 16:09:37 proxmox systemd[1]: logrotate.service: Succeeded.
Nov 14 16:09:37 proxmox systemd[1]: Finished Rotate log files.
Nov 14 16:09:38 proxmox iscsid: iSCSI daemon with pid=1037 started!
Nov 14 16:09:38 proxmox systemd[1]: Started The Proxmox VE cluster filesystem.
Nov 14 16:09:38 proxmox systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 14 16:09:38 proxmox systemd[1]: Started Regular background program processing daemon.
Nov 14 16:09:38 proxmox systemd[1]: Starting Proxmox VE firewall...
Nov 14 16:09:38 proxmox cron[1267]: (CRON) INFO (pidfile fd = 3)
Nov 14 16:09:38 proxmox systemd[1]: Starting PVE API Daemon...
Nov 14 16:09:38 proxmox systemd[1]: Starting PVE Status Daemon...
Nov 14 16:09:38 proxmox cron[1267]: (CRON) INFO (Running @reboot jobs)
Nov 14 16:09:38 proxmox pvestatd[1274]: starting server
Nov 14 16:09:38 proxmox pve-firewall[1275]: starting server
Nov 14 16:09:38 proxmox systemd[1]: Started Proxmox VE firewall.
Nov 14 16:09:38 proxmox systemd[1]: Started PVE Status Daemon.
Nov 14 16:09:38 proxmox kernel: [ 9.716570] bpfilter: Loaded bpfilter_umh pid 1279
Nov 14 16:09:38 proxmox pvedaemon[1302]: starting server
Nov 14 16:09:38 proxmox pvedaemon[1302]: starting 3 worker(s)

But I cant see anything further back.
 
Is there a command so I can get you the syslog? I am not very familiar with linux unfortunately.

Those files in /var/log/* store (mostly) human readable text, searchable/readable by any "classic" tool like "grep" and so on. That doesn't work well if one is not firm with those commands.

Everything stored in those classic logfiles is handled now by "journald" as part of "systemd".

For this reason you may ignore all the plain files in /var/log/* and still be able to get all relevant information by using "journalctl":

To look into the complete system log and see only errors you can run journalctl -p err. In a textconsole this command allows to page forward/backword and line by line with cursor up/down; and you can search for a specific string interactively by pressing "/" and entering the search-term.

As I wrote already: journalctl -p err -f shows a few latest and new/future errors continously (without the capability to walk up/down).

See man journalctlfor the manual.
 
Those files in /var/log/* store (mostly) human readable text, searchable/readable by any "classic" tool like "grep" and so on. That doesn't work well if one is not firm with those commands.

Everything stored in those classic logfiles is handled now by "journald" as part of "systemd".

For this reason you may ignore all the plain files in /var/log/* and still be able to get all relevant information by using "journalctl":

To look into the complete system log and see only errors you can run journalctl -p err. In a textconsole this command allows to page forward/backword and line by line with cursor up/down; and you can search for a specific string interactively by pressing "/" and entering the search-term.

As I wrote already: journalctl -p err -f shows a few latest and new/future errors continously (without the capability to walk up/down).

See man journalctlfor the manual.

Alright so that command gave me this:

-- Journal begins at Sun 2022-10-30 06:04:01 CET, ends at Mon 2022-11-14 19:48:53 CET. --
Oct 30 06:04:07 proxmox pvecm[1317]: got inotify poll request in wrong process - disablin>
Oct 30 06:41:53 proxmox pvedaemon[7612]: KVM virtualisation configured, but not available>
Oct 30 06:41:53 proxmox pvedaemon[1313]: <root@pam> end task UPID:proxmox:00001DBC:000379>
-- Boot 9778a22c64f4409a86ff1c8a53856aff --
Oct 30 06:47:53 proxmox smartd[945]: Device: /dev/nvme0, number of Error Log entries incr>
Oct 30 06:53:25 proxmox pvedaemon[2339]: can't lock file '/var/lock/qemu-server/lock-100.>
Oct 30 06:53:25 proxmox pvedaemon[1320]: <root@pam> end task UPID:proxmox:00000923:000080>
Oct 30 06:53:40 proxmox pvedaemon[2383]: can't lock file '/var/lock/qemu-server/lock-100.>
Oct 30 06:53:40 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:0000094F:000086>
Oct 30 06:54:04 proxmox pvedaemon[2317]: VM quit/powerdown failed - got timeout
Oct 30 06:54:04 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:0000090D:00007C>
Oct 30 06:54:47 proxmox pvedaemon[1321]: VM 100 qmp command failed - unable to open monit>
Oct 30 07:15:24 proxmox QEMU[2650]: kvm: terminating on signal 15 from pid 948 (/usr/sbin>
Oct 30 07:18:51 proxmox pvedaemon[6502]: can't lock file '/var/lock/qemu-server/lock-100.>
Oct 30 07:18:51 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001966:0002D4>
Oct 30 07:19:28 proxmox pvedaemon[6479]: VM quit/powerdown failed - got timeout
Oct 30 07:19:28 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:0000194F:0002CF>
Oct 30 07:19:57 proxmox pvedaemon[6662]: can't lock file '/var/lock/qemu-server/lock-100.>
Oct 30 07:19:57 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:00001A06:0002EE>
Oct 30 07:20:07 proxmox pvedaemon[6689]: can't lock file '/var/lock/qemu-server/lock-100.>
Oct 30 07:20:07 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001A21:0002F2>
lines 1-22...skipping...
-- Journal begins at Sun 2022-10-30 06:04:01 CET, ends at Mon 2022-11-14 19:48:53 CET. --
Oct 30 06:04:07 proxmox pvecm[1317]: got inotify poll request in wrong process - disabling inotify
Oct 30 06:41:53 proxmox pvedaemon[7612]: KVM virtualisation configured, but not available. Either disable in VM >
Oct 30 06:41:53 proxmox pvedaemon[1313]: <root@pam> end task UPID:proxmox:00001DBC:000379BC:635E0EA1:qmstart:100>
-- Boot 9778a22c64f4409a86ff1c8a53856aff --
Oct 30 06:47:53 proxmox smartd[945]: Device: /dev/nvme0, number of Error Log entries increased from 0 to 1434
Oct 30 06:53:25 proxmox pvedaemon[2339]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 06:53:25 proxmox pvedaemon[1320]: <root@pam> end task UPID:proxmox:00000923:00008089:635E114B:qmstop:100:>
Oct 30 06:53:40 proxmox pvedaemon[2383]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 06:53:40 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:0000094F:00008679:635E115A:qmshutdown:>
Oct 30 06:54:04 proxmox pvedaemon[2317]: VM quit/powerdown failed - got timeout
Oct 30 06:54:04 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:0000090D:00007C8B:635E1140:qmshutdown:>
Oct 30 06:54:47 proxmox pvedaemon[1321]: VM 100 qmp command failed - unable to open monitor socket
Oct 30 07:15:24 proxmox QEMU[2650]: kvm: terminating on signal 15 from pid 948 (/usr/sbin/qmeventd)
Oct 30 07:18:51 proxmox pvedaemon[6502]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:18:51 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001966:0002D49E:635E1741:qmstop:100:>
Oct 30 07:19:28 proxmox pvedaemon[6479]: VM quit/powerdown failed - got timeout
Oct 30 07:19:28 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:0000194F:0002CF9C:635E1734:qmreboot:10>
Oct 30 07:19:57 proxmox pvedaemon[6662]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:19:57 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:00001A06:0002EEB2:635E1783:qmstop:100:>
Oct 30 07:20:07 proxmox pvedaemon[6689]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:20:07 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001A21:0002F2A7:635E178D:qmreboot:10>
Oct 30 07:20:32 proxmox pvedaemon[6618]: VM quit/powerdown failed - got timeout
Oct 30 07:20:32 proxmox pvedaemon[1320]: <root@pam> end task UPID:proxmox:000019DA:0002E8A0:635E1774:qmshutdown:>
Oct 30 07:20:46 proxmox pvedaemon[6780]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:20:46 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001A7C:000301D1:635E17B4:qmreset:100>
Oct 30 07:20:55 proxmox pvedaemon[6798]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:20:55 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:00001A8E:00030546:635E17BD:qmreboot:10>
Oct 30 07:21:19 proxmox pvedaemon[6852]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
lines 1-29...skipping...
-- Journal begins at Sun 2022-10-30 06:04:01 CET, ends at Mon 2022-11-14 19:48:53 CET. --
Oct 30 06:04:07 proxmox pvecm[1317]: got inotify poll request in wrong process - disabling inotify
Oct 30 06:41:53 proxmox pvedaemon[7612]: KVM virtualisation configured, but not available. Either disable in VM con>
Oct 30 06:41:53 proxmox pvedaemon[1313]: <root@pam> end task UPID:proxmox:00001DBC:000379BC:635E0EA1:qmstart:100:ro>
-- Boot 9778a22c64f4409a86ff1c8a53856aff --
Oct 30 06:47:53 proxmox smartd[945]: Device: /dev/nvme0, number of Error Log entries increased from 0 to 1434
Oct 30 06:53:25 proxmox pvedaemon[2339]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 06:53:25 proxmox pvedaemon[1320]: <root@pam> end task UPID:proxmox:00000923:00008089:635E114B:qmstop:100:roo>
Oct 30 06:53:40 proxmox pvedaemon[2383]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 06:53:40 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:0000094F:00008679:635E115A:qmshutdown:100>
Oct 30 06:54:04 proxmox pvedaemon[2317]: VM quit/powerdown failed - got timeout
Oct 30 06:54:04 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:0000090D:00007C8B:635E1140:qmshutdown:100>
Oct 30 06:54:47 proxmox pvedaemon[1321]: VM 100 qmp command failed - unable to open monitor socket
Oct 30 07:15:24 proxmox QEMU[2650]: kvm: terminating on signal 15 from pid 948 (/usr/sbin/qmeventd)
Oct 30 07:18:51 proxmox pvedaemon[6502]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:18:51 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001966:0002D49E:635E1741:qmstop:100:roo>
Oct 30 07:19:28 proxmox pvedaemon[6479]: VM quit/powerdown failed - got timeout
Oct 30 07:19:28 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:0000194F:0002CF9C:635E1734:qmreboot:100:r>
Oct 30 07:19:57 proxmox pvedaemon[6662]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:19:57 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:00001A06:0002EEB2:635E1783:qmstop:100:roo>
Oct 30 07:20:07 proxmox pvedaemon[6689]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:20:07 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001A21:0002F2A7:635E178D:qmreboot:100:r>
Oct 30 07:20:32 proxmox pvedaemon[6618]: VM quit/powerdown failed - got timeout
Oct 30 07:20:32 proxmox pvedaemon[1320]: <root@pam> end task UPID:proxmox:000019DA:0002E8A0:635E1774:qmshutdown:100>
Oct 30 07:20:46 proxmox pvedaemon[6780]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:20:46 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001A7C:000301D1:635E17B4:qmreset:100:ro>
Oct 30 07:20:55 proxmox pvedaemon[6798]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:20:55 proxmox pvedaemon[1322]: <root@pam> end task UPID:proxmox:00001A8E:00030546:635E17BD:qmreboot:100:r>
Oct 30 07:21:19 proxmox pvedaemon[6852]: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Oct 30 07:21:19 proxmox pvedaemon[1321]: <root@pam> end task UPID:proxmox:00001AC4:00030EB1:635E17D5:qmshutdown:100>
Oct 30 07:21:32 proxmox pvedaemon[6758]: VM quit/powerdown failed - got timeout
lines 1-3
 
First I'd suggest you update to the latest 7.2 version. Especially the newer kernel versions would be important.
Then I'd suggest to enable `SVM` in the BIOS so that your VMs use hardware virtualization.
 
  • Like
Reactions: UdoB
First I'd suggest you update to the latest 7.2 version. Especially the newer kernel versions would be important.
Then I'd suggest to enable `SVM` in the BIOS so that your VMs use hardware virtualization.

Hardware virtualization is already enabled in BIOS, but how do I update everything?
 
Just wanted to say that I have exactly the same problem. Running a handful of VMs, everything was fine for the last 2 months. Didn't change anything, but since last week, the whole proxmox server freezes at least 3 times a day. Running on a Ryzen 7 with the same Proxmox and kernel versions as OP. Also nothing noteworthy in the syslog.
 
Just wanted to say that I have exactly the same problem. Running a handful of VMs, everything was fine for the last 2 months. Didn't change anything, but since last week, the whole proxmox server freezes at least 3 times a day. Running on a Ryzen 7 with the same Proxmox and kernel versions as OP. Also nothing noteworthy in the syslog.
Seems to be the same for my system. I have two nodes with identical hardware:
CPU(s) 64 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz (2 Sockets), 192 GB RAM, ZFS
PVE01: Kernel Version Linux 5.15.53-1-pve #1 SMP PVE 5.15.53-1 (Fri, 26 Aug 2022 16:53:52 +0200)
PVE02: Kernel Version Linux 5.15.64-1-pve #1 SMP PVE 5.15.64-1 (Thu, 13 Oct 2022 10:30:34 +0200)

PVE01 is running without any problem, PVE02 crashes if one VM is running.
RAM test an PVE02 was without any errors.
 
Last edited:
Could you try kernel 5.15.60-2-pve instead of the latest one and see if it has the same issues?
Do you see anything in the syslogs, e.g. kernel panics or similar? Anything that indicates an issue rather than just silent freezes?
 
Thanks for getting back. Just installed kernel 5.15.60-2-pve. Will get back to you on the next freeze.
EDIT: after an hour it froze again

There is literally nothing in the syslog in between the moments leading up to the freeze, and the next boot (after manual reset). Last entry in the syslog was from 10 minutes before the freeze happened and is completely unrelated.

Any idea as to how the Proxmox freeze could also freeze my NAS? Come to think of it.. the only thing that makes me think that the system is frozen, is because I cannot access it via LAN. Is there any way I can confirm the system is actually frozen, as opposed to the LAN port just stopping to work for some reason?
 
Last edited:
Thanks for getting back. Just installed kernel 5.15.60-2-pve. Will get back to you on the next freeze.
EDIT: after an hour it froze again
I installed the latest BIOS and I'm currently running 5.15.60-1-pve, this was available in the boot menu. My server was running a longer time period, so I have to wait a bit longer.
Can you tell me, how I can install an older kernel version (permanent downgrade)?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!