Proxmox Crashes overnight.

Mpro111 · Mar 5, 2024

Hello everyone,

First time posting on this forum, if I shouldn't post this here just let me know.

I now have a Proxmox Server in my HomeLab for about 2 weeks. I made a install of the newest Proxmox VE 8.1. I changed the update Server to No-Subscription. I didn't change anything else in the Proxmox Config. In the Moment I only have about 5 VM's running (TrueNAS, Docker with NGNIX Proxy Manager, Tailscale VPN, HomeAssistant, Debian VM for some testing).

But after a week I had my first crash. I forgot to save the Logs from then but it's the same as today. I can't connect to my Tailscale VPN from outside of my Network. But inside my Network HomeAssistant is down, and I can't connect to the Proxmox Server via the Web-Interface and via SSH. But for Example my NGNIX Proxy Manager Instance is still up. I tried powering it off by pressing the Power Button but it wouldn't Power Off even after 10 Minutes. After the 10 Minutes I force shut it down and turned it back on. It now worked again for the past hour, but I would love it to be available 24/7.

Hardware:
Intel Core i7-4770
16 GB Ram
4 x 6TB WD RED HDD in TrueNAS
2x 256 GB SSD Boot Drives in ZFS Raid
1 x 2 TB SSD for VM

Journalctl Log before Crash:
Mar 05 05:48:14 Hostname systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Mar 05 05:48:16 Hostname pveupdate[612016]: <root@pam> starting task UPID:Hostname:000956C6:017BED87:65E6A410:aptupdate::root@pam:
Mar 05 05:48:19 Hostname pveupdate[612038]: update new package list: /var/lib/pve-manager/pkgupdates
Mar 05 05:48:20 Hostname pveupdate[612016]: <root@pam> end task UPID:Hostname:000956C6:017BED87:65E6A410:aptupdate::root@pam: OK
Mar 05 05:48:20 Hostname systemd[1]: pve-daily-update.service: Deactivated successfully.
Mar 05 05:48:20 Hostname systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Mar 05 05:48:20 Hostname systemd[1]: pve-daily-update.service: Consumed 3.223s CPU time.
Mar 05 06:17:01 Hostname CRON[616620]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 05 06:17:01 Hostname CRON[616621]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Mar 05 06:17:01 Hostname CRON[616620]: pam_unix(cron:session): session closed for user root
-- Boot f54a55294d86443c8848982767356b38 --

Thank you for your help in advance!
If you need a different Log or some more Information just let me know!

Hqu · Mar 6, 2024

Hey,

thanks for posting your problem.

Can you paste the output from dmesg from the time and date before and after this happens?

Best

Mpro111 · Mar 13, 2024

Hello Hqu

Thank you for your reply. I checked the dmesg out but i couldn't find out how to navigate to this date.

But it crashed 2 times since then. I connected a Monitor to it and saw some "errors". I Attached them in this comment. Can you help me? I really don't know what to do and can't find similar stuff on the Internet.

And sorry this is so late. I really had a rough week.

Have a nice day!

gfngfn256 · Mar 13, 2024

Mpro111 said:
Journalctl Log before Crash:

At what time did you force shut it down?

Mpro111 said:
Mar 05 06:17:01 Hostname CRON[616620]: pam_unix(cron:session): session closed for user root
-- Boot f54a55294d86443c8848982767356b38 --

At this time above? (Just after 06.17 am) Or much later? Until when was NGNIX Proxy Manager Instance still up?
I'm trying to establish how late we can see logs?

Are you doing any GPU passthrough of any sort?

Mpro111 · Mar 13, 2024

Thank you for your quick reply.

No I think the Website was cached on my Computer. I really don't remember.

No I force shut it down about 17:00:00 after I came home from work.

I don't do any passtrough. I also killed my TrueNas to see if this is the Problem.

Mpro111 · Mar 13, 2024

It worked fine for a few days but after that it crashed again. But this time I connected a Monitor and saw this:
https://imgur.com/a/Tve7Jpv

After a reboot it worked fine again until today:

https://imgur.com/a/7e4zmR0

gfngfn256 · Mar 13, 2024

Ok so we can assume it dead crashes every time and becomes completely unresponsive.
I think we can rule out CPU overheating/shutdown - because you do show errors on the console. (Assuming they are caused by the same instance).
Unlikely; but what's your network stability like?
Have you used this HW before in a different OS setup successfully?
Your RAM appears extremely tight for your setup - especially with all that large ZFS going on. (How have you configured/allocated the RAM).
I would anyway check for a memory error.

Mpro111 · Mar 13, 2024

It was running Windows before as a Desktop PC without issues. My Network was fine before with a Raspberry Pi running HomeAssistant.

I will run Memtest when I'm back home.

What do you mean with Ram allocation? My RAM usage is around 50% without TrueNAS.

Also what are these kills of systemd?

Thank you for your help and patience!

gfngfn256 · Mar 13, 2024

Mpro111 said:
What do you mean with Ram allocation?

How much have you allocated for each VM, how much remains for the PVE host.

ZFS memory: As per the official docs:

ZFS uses 50 % of the host memory for the Adaptive Replacement Cache (ARC) by default. Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage.

You have a total Storage space of 26.5 TB.
I don't know your storage.cfg, but I assume 24TB WD is ZFS + 0.5 TM ZFS boot raid. (Don't know about the 2TB ssd?).
Do the math's yourself. It doesn't look good.

homelabenthusiast · Mar 13, 2024

@Mpro111 do you have backup running at night writing to your truenas? If yes, this may be related. It could be that you backup is putting too much load on your Truenas vm that it cannot keep up until it crashes. If that#s the case, try to limit that bandwidth of your backup

Mpro111 · Mar 14, 2024

homelabenthusiast said:
@Mpro111 do you have backup running at night writing to your truenas? If yes, this may be related. It could be that you backup is putting too much load on your Truenas vm that it cannot keep up until it crashes. If that#s the case, try to limit that bandwidth of your backup

I didn't configure that yet. But i also stopped my TrueNas VM and it still crashed again.

But thank you for your input!

Mpro111 · Mar 14, 2024

gfngfn256 said:
How much have you allocated for each VM, how much remains for the PVE host.

ZFS memory: As per the official docs:

You have a total Storage space of 26.5 TB.
I don't know your storage.cfg, but I assume 24TB WD is ZFS + 0.5 TM ZFS boot raid. (Don't know about the 2TB ssd?).
Do the math's yourself. It doesn't look good.

I removed the big 24 TB ZFS Pool in TrueNAS and only had the 0.5 TB ZFS Boot Raid but I still has crashes.

gfngfn256 · Mar 14, 2024

You still haven't disclosed how much RAM you have allocated for each VM and how much remains for the PVE host.

Mpro111 said:
I removed the big 24 TB ZFS Pool in TrueNAS

Not too sure what you mean by that.

What does cat /etc/pve/storage.cfg show.

homelabenthusiast · Mar 14, 2024

@Mpro111 looking at your fotos you have attached it looks like your rpool is having some errors. Have you checked your rpool? Specifically your disks on that pool? I hope you have a good backup before s*** happens. Please show us the output of

Code:

zpool status

Hqu · Mar 14, 2024

Mpro111 said:
Hello Hqu

Thank you for your reply. I checked the dmesg out but i couldn't find out how to navigate to this date.

But it crashed 2 times since then. I connected a Monitor to it and saw some "errors". I Attached them in this comment. Can you help me? I really don't know what to do and can't find similar stuff on the Internet.

And sorry this is so late. I really had a rough week.

Have a nice day!

What's the output of zpool status rpool ? Did you changed the disk before you turned it into a PVE? As there are heavily uncorreractable I/O failure dmesg it sounds very related to that one.

Is this rpool your only datastore?

Best

showiproute · Mar 14, 2024

And can you also share which HDDs/SSDs you are using for this pool?

Search

Search

Proxmox Crashes overnight.

Mpro111

New Member

Hqu

Member

Mpro111

New Member

Attachments

gfngfn256

Well-Known Member

Mpro111

New Member

Mpro111

New Member

gfngfn256

Well-Known Member

Mpro111

New Member

gfngfn256

Well-Known Member

homelabenthusiast

Member

Mpro111

New Member

Mpro111

New Member

gfngfn256

Well-Known Member

homelabenthusiast

Member

Hqu

Member

showiproute

Well-Known Member