Proxmox server freezes...

chindajiu

New Member
Jul 31, 2024
6
0
1
Hello,
I have a problem with my Proxmox home server. I'm a beginner in this field and have already searched for solutions to my issue, but unfortunately without success. I installed Proxmox VE (latest version) on a Lenovo ThinkCentre with an AMD Ryzen 5 2400GE and 32GB of RAM. The RAM module is brand new. Originally, the mini PC had 2x4GB installed.
The server crashes at irregular intervals. It basically freezes. I've already shut down all the VMs I had installed on it, but the server still crashes. Sometimes after a day, sometimes after just a few hours—it varies. I also ran a memtest, which I let run for about 9 hours, and no errors were found.
What options do I have to identify the error? Do you have any tips for me?

Additional information about the hardware:
I am using a (also brand-new) NVME from Lexar. For TrueNAS, which I have installed on the server, I have a 2.5" HDD installed. But as I mentioned, even when I shut down all VMs, the server still crashes. So, I assume the issue is not related to the HDD.
 
Last edited:
Welcome to the forum, chindajiu!

If a "crash"/"freeze" means that the server is shutting down or is completely unresponsive, it sounds like a CPU stability issue. In that case, it would be interesting if you've made any changes in the BIOS settings to the CPU and if you've installed the most recent BIOS firmware and AMD CPU microcode.

In any case, it would be helpful to have a boot log (e.g. dmesg) and a system log (e.g. journalctl) just before the crash happens.
 
Yes, it gets completely unresponive.

I didn't do any changes to the BIOS. I have to check if firmware and microcode is the latest one.

How can I get a bootlog or a systemlog just before crash happens?? If it crashes, all will be deleted....
 
You can retrieve the boot log with dmesg > sysboot.log (for information about any warnings/errors when setting up the system). It will be saved into the file sysboot.log.

You can retrieve the system log from the last boot with journalctl -b -1 -e. The -b -1 argument tells the journal to output the log from the last boot and -e will jump to the end so you should see the last entries before the system crashed. Be aware that you can use the arrow keys up/down to scroll the system log, so you could include all log entries that seem relevant to you.
 
Hello dekralex,

thx for the reply and the useful information.

I'll query the logs after the next crash and post them here to get your support based on your experience. Thanks in advance!
 
Hi!

After a freeze I got the Log with

journalctl -b -1 -e

But I don't see any message that could help to see where the problem is. The message I can see is often the same:

Sep 11 19:23:52 pve pvestatd[1194]: zfs error: cannot open 'truenas-pool': no such pool
Sep 11 19:23:52 pve pvestatd[1194]: zfs error: cannot open 'truenas-pool': no such pool
Sep 11 19:23:52 pve pvestatd[1194]: could not activate storage 'truenas-pool', zfs error: cannot import 'truenas-pool': no such pool available

This error message appears even if the VM running truenas is offline.

What could the problem be??
 
The system log is very short and not helpful for solving the problem. Could you post the boot log and a bit more from the system log? Please ensure to run both as a privileged user (root), as there will be more information that can be helpful to solve the problem.

FYI, you should disable storages that are currently offline, so that your system log won't become unnecessarily long with messages from pvestatd as you have posted above.

Even though the RAM modules are brand new, it doesn't necessarily mean they couldn't still be faulty. I would still suggest running a memtest on them for a couple of hours (e.g. overnight) to ensure that it's not a hardware issue. Also you could also safely install the newest CPU microcode (which could include stability fixes, security patches and functional improvements) as described in the guide at [1].

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_firmware_cpu
 
Hey,

it seems to me, that you try to use a ressource from the VM "Truenas" in the proxmox hypervisor. Then, if the VM hangs or is offline, you will have problems.
From 1st view it looks more as a configuration mismatch as an hardware error.
 
Thank you for your responses.

As a precaution, I updated the microcode following the mentioned instructions. Additionally, I removed the missing storage 'truenas-pool' from the configuration file '/etc/pve/storage.cfg'. It no longer appears in the Proxmox web interface (it was previously visible and marked with a question mark).

What surprises me, however, is that during boot, a truenas-pool is still being attempted to load. Question: Do I need this for my TrueNAS VM, or is the service completely meaningless?

Here is the boot log (just search for 'truenas' in there:
 

Attachments

  • bootlog.txt
    138.9 KB · Views: 6
Hey chindajiu!

Thank you for the boot log. I've noticed that you're running the Linux kernel version 6.8.4-2, which is unfortunately notorious for instability on many systems as can be seen across the forum, e.g. at [1]. I'd suggest you to do an upgrade of your PVE installation either through the WebGUI or the shell as described at [2].

Also for your TrueNAS situation, your log states that a zpool is still trying to import your TrueNAS. How have you setup your zpool and for what are you using it?

[1] https://forum.proxmox.com/threads/r...-ssh-and-all-running-vms-unresponsive.145981/
[2] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#system_software_updates
 
Hi!

This appears when I run an apt-get update on the shell:

Code:
root@pve:~# apt-get update
Hit:1 http://security.debian.org bookworm-security InRelease
Hit:2 http://ftp.de.debian.org/debian bookworm InRelease                       
Hit:3 http://ftp.de.debian.org/debian bookworm-updates InRelease               
Err:4 https://enterprise.proxmox.com/debian/ceph-quincy bookworm InRelease
  401  Unauthorized [IP: 185.219.221.167 443]
Err:5 https://enterprise.proxmox.com/debian/pve bookworm InRelease
  401  Unauthorized [IP: 185.219.221.167 443]
Reading package lists... Done
E: Failed to fetch https://enterprise.proxmox.com/debian/ceph-quincy/dists/bookworm/InRelease  401  Unauthorized [IP: 185.219.221.167 443]
E: The repository 'https://enterprise.proxmox.com/debian/ceph-quincy bookworm InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Failed to fetch https://enterprise.proxmox.com/debian/pve/dists/bookworm/InRelease  401  Unauthorized [IP: 185.219.221.167 443]
E: The repository 'https://enterprise.proxmox.com/debian/pve bookworm InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

When I run apt-get dist-upgrade, nothing happens.

Actually, at first, I had misconfigured something in TrueNAS, and the pool didn't quite work. But I deleted the pool, created a new one, and everything worked afterward. In TrueNAS, I don't see any errors anymore, and access to my shares works.

Why there are still any misconfigurations in the background, I don't know. How can I eliminate them?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!