Proxmox Crashes overnight.

Mpro111

New Member
Mar 5, 2024
7
0
1
Hello everyone,

First time posting on this forum, if I shouldn't post this here just let me know.

I now have a Proxmox Server in my HomeLab for about 2 weeks. I made a install of the newest Proxmox VE 8.1. I changed the update Server to No-Subscription. I didn't change anything else in the Proxmox Config. In the Moment I only have about 5 VM's running (TrueNAS, Docker with NGNIX Proxy Manager, Tailscale VPN, HomeAssistant, Debian VM for some testing).

But after a week I had my first crash. I forgot to save the Logs from then but it's the same as today. I can't connect to my Tailscale VPN from outside of my Network. But inside my Network HomeAssistant is down, and I can't connect to the Proxmox Server via the Web-Interface and via SSH. But for Example my NGNIX Proxy Manager Instance is still up. I tried powering it off by pressing the Power Button but it wouldn't Power Off even after 10 Minutes. After the 10 Minutes I force shut it down and turned it back on. It now worked again for the past hour, but I would love it to be available 24/7.

Hardware:
Intel Core i7-4770
16 GB Ram
4 x 6TB WD RED HDD in TrueNAS
2x 256 GB SSD Boot Drives in ZFS Raid
1 x 2 TB SSD for VM

Journalctl Log before Crash:
Mar 05 05:48:14 Hostname systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Mar 05 05:48:16 Hostname pveupdate[612016]: <root@pam> starting task UPID:Hostname:000956C6:017BED87:65E6A410:aptupdate::root@pam:
Mar 05 05:48:19 Hostname pveupdate[612038]: update new package list: /var/lib/pve-manager/pkgupdates
Mar 05 05:48:20 Hostname pveupdate[612016]: <root@pam> end task UPID:Hostname:000956C6:017BED87:65E6A410:aptupdate::root@pam: OK
Mar 05 05:48:20 Hostname systemd[1]: pve-daily-update.service: Deactivated successfully.
Mar 05 05:48:20 Hostname systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Mar 05 05:48:20 Hostname systemd[1]: pve-daily-update.service: Consumed 3.223s CPU time.
Mar 05 06:17:01 Hostname CRON[616620]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 05 06:17:01 Hostname CRON[616621]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Mar 05 06:17:01 Hostname CRON[616620]: pam_unix(cron:session): session closed for user root
-- Boot f54a55294d86443c8848982767356b38 --


Thank you for your help in advance!
If you need a different Log or some more Information just let me know!
 
Hey,

thanks for posting your problem.

Can you paste the output from dmesg from the time and date before and after this happens?

Best
 
Hello Hqu

Thank you for your reply. I checked the dmesg out but i couldn't find out how to navigate to this date.

But it crashed 2 times since then. I connected a Monitor to it and saw some "errors". I Attached them in this comment. Can you help me? I really don't know what to do and can't find similar stuff on the Internet.

And sorry this is so late. I really had a rough week.

Have a nice day!
 

Attachments

  • Proxmox Crash 1.jpg
    Proxmox Crash 1.jpg
    642 KB · Views: 26
  • Proxmox Crash 2.jpg
    Proxmox Crash 2.jpg
    359.4 KB · Views: 25
Journalctl Log before Crash:
At what time did you force shut it down?

Mar 05 06:17:01 Hostname CRON[616620]: pam_unix(cron:session): session closed for user root
-- Boot f54a55294d86443c8848982767356b38 --

At this time above? (Just after 06.17 am) Or much later? Until when was NGNIX Proxy Manager Instance still up?
I'm trying to establish how late we can see logs?

Are you doing any GPU passthrough of any sort?
 
Thank you for your quick reply.

No I think the Website was cached on my Computer. I really don't remember.

No I force shut it down about 17:00:00 after I came home from work.

I don't do any passtrough. I also killed my TrueNas to see if this is the Problem.
 
Ok so we can assume it dead crashes every time and becomes completely unresponsive.
I think we can rule out CPU overheating/shutdown - because you do show errors on the console. (Assuming they are caused by the same instance).
Unlikely; but what's your network stability like?
Have you used this HW before in a different OS setup successfully?
Your RAM appears extremely tight for your setup - especially with all that large ZFS going on. (How have you configured/allocated the RAM).
I would anyway check for a memory error.
 
Last edited:
It was running Windows before as a Desktop PC without issues. My Network was fine before with a Raspberry Pi running HomeAssistant.

I will run Memtest when I'm back home.

What do you mean with Ram allocation? My RAM usage is around 50% without TrueNAS.

Also what are these kills of systemd?

Thank you for your help and patience!
 
Last edited:
What do you mean with Ram allocation?
How much have you allocated for each VM, how much remains for the PVE host.

ZFS memory: As per the official docs:
ZFS uses 50 % of the host memory for the Adaptive Replacement Cache (ARC) by default. Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage.

You have a total Storage space of 26.5 TB.
I don't know your storage.cfg, but I assume 24TB WD is ZFS + 0.5 TM ZFS boot raid. (Don't know about the 2TB ssd?).
Do the math's yourself. It doesn't look good.
 
@Mpro111 do you have backup running at night writing to your truenas? If yes, this may be related. It could be that you backup is putting too much load on your Truenas vm that it cannot keep up until it crashes. If that#s the case, try to limit that bandwidth of your backup
 
@Mpro111 do you have backup running at night writing to your truenas? If yes, this may be related. It could be that you backup is putting too much load on your Truenas vm that it cannot keep up until it crashes. If that#s the case, try to limit that bandwidth of your backup

I didn't configure that yet. But i also stopped my TrueNas VM and it still crashed again.

But thank you for your input!
 
How much have you allocated for each VM, how much remains for the PVE host.

ZFS memory: As per the official docs:


You have a total Storage space of 26.5 TB.
I don't know your storage.cfg, but I assume 24TB WD is ZFS + 0.5 TM ZFS boot raid. (Don't know about the 2TB ssd?).
Do the math's yourself. It doesn't look good.
I removed the big 24 TB ZFS Pool in TrueNAS and only had the 0.5 TB ZFS Boot Raid but I still has crashes.
 
You still haven't disclosed how much RAM you have allocated for each VM and how much remains for the PVE host.

I removed the big 24 TB ZFS Pool in TrueNAS
Not too sure what you mean by that.

What does cat /etc/pve/storage.cfg show.
 
@Mpro111 looking at your fotos you have attached it looks like your rpool is having some errors. Have you checked your rpool? Specifically your disks on that pool? I hope you have a good backup before s*** happens. Please show us the output of
Code:
zpool status
 
Last edited:
Hello Hqu

Thank you for your reply. I checked the dmesg out but i couldn't find out how to navigate to this date.

But it crashed 2 times since then. I connected a Monitor to it and saw some "errors". I Attached them in this comment. Can you help me? I really don't know what to do and can't find similar stuff on the Internet.

And sorry this is so late. I really had a rough week.

Have a nice day!
What's the output of zpool status rpool ? Did you changed the disk before you turned it into a PVE? As there are heavily uncorreractable I/O failure dmesg it sounds very related to that one.

Is this rpool your only datastore?

Best
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!