[SOLVED] Proxmox "freezing" randomly

Lets try & analyze your problem.

The machine is still online and running, but no one is home.
What do you mean by "online"? Is it still pingable? Can you SSH in? When you say "no one is home", you mean the GUI?

I would start testing by rebooting but not running any VMs & see if it still crashes. If it doesn't you'll start one VM & test then add another etc.

I see your HW is pretty old (at least the CPU i7-3770K is 12 years old) - Is this the first time you are using this specific HW. Have you tested the RAM? Is the PSU adequate etc... ?
 
What I mean by no one is home, is that the screen/display still looks the same, the power light is on on the machine.
But the blinking "Thinking" light on the machine is dead and no inputs seem to do anything.
I need to hold the power button to shut down and then reboot.

I don't recall trying to SSH in (Also a novice when it comes to SSH in general)
But I do recall having it plugged into a screen and seeing the normal "enter username" or something.
Will need to wait for another crash and report back.


I've been dreading testing without any VM's running.
This server now runs my entire HomeAssistant and TrueNAS VM's, as well as a few other projects.

The issue also happens anywhere between 1 hour and 1 week... not the best basis for a one VM at a time test.
As if it crashes on day 2, it could still be non related to which VM I boot up :/
Small catch22 situation.

The hardware is indeed old, but it was my main work machine for many many years.
I recently replaced the Motherboard and CPU with a known working pair as the previous Mobo or CPU died (Not sure which one).
Would there not be any warnings/logs if certain components are causing it to crash/freeze?
I mean when a computer dies for strange reasons there's an event viewer of sorts to at least see what went wrong \o/

Let me keep an eye out.
When it freezes/crashes again I will take note of the screen and see if it responds to keyboard inputs.
I'll give SSH a go and see how far I get.

Thanks for your input and patience this far.
 
Last edited:
So it's the next morning, and once again ProxMox is down.
Image of the screen:
Keyboard does not work, lights don't even change when hitting caps lock or num lock.
1718863377251.png
Image of the machine:
Power LED is on, but the HDD LED is dead.
1718863400892.png

When attempting SSH:
Code:
C:\Users\calvi>ssh root@192.168.10.200
ssh: connect to host 192.168.10.200 port 22: Connection timed out

When attempting to ping proxox or a VM:
Code:
C:\Users\calvi>ping 192.168.10.200

Pinging 192.168.10.200 with 32 bytes of data:
Reply from 192.168.10.137: Destination host unreachable.
Reply from 192.168.10.137: Destination host unreachable.
Reply from 192.168.10.137: Destination host unreachable.
Reply from 192.168.10.137: Destination host unreachable.

Ping statistics for 192.168.10.200:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

C:\Users\calvi>ping 192.168.10.101

Pinging 192.168.10.101 with 32 bytes of data:
Reply from 192.168.10.137: Destination host unreachable.
Reply from 192.168.10.137: Destination host unreachable.
Reply from 192.168.10.137: Destination host unreachable.
Reply from 192.168.10.137: Destination host unreachable.

Ping statistics for 192.168.10.101:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Logs at the time of the Crash/Freeze:
Code:
Jun 20 06:55:18 homelab systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Jun 20 06:55:18 homelab systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Jun 20 06:55:18 homelab systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Jun 20 06:55:18 homelab systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Jun 20 06:59:43 homelab chronyd[985]: Selected source 196.10.55.57 (2.debian.pool.ntp.org)
Jun 20 07:11:34 homelab chronyd[985]: Selected source 196.10.54.58 (2.debian.pool.ntp.org)
Jun 20 07:17:01 homelab CRON[8317]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 20 07:17:01 homelab CRON[8318]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 20 07:17:01 homelab CRON[8317]: pam_unix(cron:session): session closed for user root
-- Boot 03fb9b9c032b484daba3431c44b47257 --
 
Have you already updated PVE

1. In GUI, go to Updates, Repositories, & make sure the pve-no-subscription is active. While here you can/should also disable the pve-enterprise one - since you don't have a subscription (yet).
2. Go back to Update & press Refresh
3. Finally (when above has finished) press the >_ Upgrade button.

Feel free to post the output here.

I'm going to badger you again to make sure you have backups. You must realize you may have a failing system - and without backups you'll be on your own!
 
Have you already updated PVE

1. In GUI, go to Updates, Repositories, & make sure the pve-no-subscription is active. While here you can/should also disable the pve-enterprise one - since you don't have a subscription (yet).
2. Go back to Update & press Refresh
3. Finally (when above has finished) press the >_ Upgrade button.

Feel free to post the output here.

I'm going to badger you again to make sure you have backups. You must realize you may have a failing system - and without backups you'll be on your own!
I have indeed updated PVE.
I see there are 2 more updates since I updated a few days ago
1718870916600.png

Busy implementing backups as we speak, once backups are done I will re-run the Upgrade and revert back.

The freezing/crashing has been happening since day 1.
It only got more frequent after the upgrade process a few days ago.
 
When all updates/upgrades are over & you have rebooted (maybe shutdown, power off the plug / remove cable & restart), then post the output here in code tags of:

Code:
pveversion -v
 
As requested:
Code:
root@homelab:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.5-1
proxmox-backup-file-restore: 3.2.4-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
 
Update looks good.

Maybe you have a spare PC/Raspberry Pi around to run HA on while conducting crash tests on server?

If problems persist - I would look into the HW.

This:
I recently replaced the Motherboard and CPU with a known working pair as the previous Mobo or CPU died (Not sure which one).
leads me to suspect a bad PSU - unstable/incorrect power supply will kill electronics eventually. So I'd test that PSU first.

General HW list to focus on (in order):

1. PSU (as above / replace?)
2. Connections. Check all cables/connectors are seated correctly, all RAM is perfectly slotted. All expansion cards etc.
2. RAM (memtest/replace?)
3. Thermals (maybe the CPU cooler is not effective etc.).
4. USB sockets (check every one for solid data + power connection)
5. OS drive/others (integrity)
6. NW integrity starting from server to switches/routers.

& finally good luck.
 
Thank you once again for your advise and assistance.
Although it is not resolved, you have narrowed down the potentials.

Have a grate weekend coming up.
 
  • Like
Reactions: gfngfn256
Good evening.

Thought I would do the curtsy of updating those whom helped.

It must have been a hardware issue.
I took the SSD and HDD's from the old machine, and moved them into completely different hardware.
Since the move, I have not had a single freeze/crash.

Not sure which piece of hardware was the culprit, but the issue has been resolved by means of "out with the old and in with the new".

Marking post as solved.
 
  • Like
Reactions: justinclift
If the pieces of the old box can somehow be booted up to run a test on, a useful one to start with would be memtest86+:

https://github.com/memtest86plus/memtest86plus/releases

The problem you're describing sounds like what can happen with ram that's flaky in some way. The memtest86+ program is pretty good at diagnosing that particular thing, though it's the kind of test you just need to leave running for like a full day + night. :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!