Crash on idle

otoman

Member
Mar 25, 2022
35
4
8
Hello,

I'm completely new to PVE and have started using/learning it since 2 weeks ago. Yesterday I was able to configure an LXC appliance like I wanted and left it to download something over night. When I came home an hour ago, I noticed that my samba share (installed directly onto the PVE) has disconnected so I tried connecting an HDMI cable to the box to see something in the terminal (after I couldn't SSH into it from my laptop). The computer kept working (atl least the fans worked) but I couldn't get an image to show nor do anything with it. I pressed the reboot button which did nothing after which I shut it down with the power button. When it turned back on it seems to work fine. I have no idea what could've happened as it was sitting idle pretty much the whole time after the download ended. I've copied the syslog ranging from the end of my activity yesterday to today. I notice lots of hardrware and firmware errors but I can't make sense of it.

This is my configuration:

Latest PVE downloaded 2 weeks ago
GIGABYTE Z690 UD DDR4 motherboard
Intel Core i7 12700
2x32 GB of DDR4 Kingston Fury Renegade RAM
Samsung EVO 980 240 GB SSD
4x8 TB WD GOLD
750W Seasonic power supply

Any help would be greatly appreciated.

Thanks.
 

Attachments

What I can gather from the logs is that the Proxmox host is trying to send lost of e-mails to you but failing. I guess they could be an indication that something is wrong (besides not being able to send you e-mail). Also storage usb-sdg1 appears to be missing; did you unplug some kind of USB-storage device?

I don't know what to make of this:
Mar 25 20:46:33 PVE1 kernel: BERT: Error records from previous boot: Mar 25 20:46:33 PVE1 kernel: [Hardware Error]: event severity: fatal Mar 25 20:46:33 PVE1 kernel: [Hardware Error]: Error 0, type: fatal Mar 25 20:46:33 PVE1 kernel: [Hardware Error]: section_type: Firmware Error Record Reference Mar 25 20:46:33 PVE1 kernel: [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2 Mar 25 20:46:33 PVE1 kernel: [Hardware Error]: Revision: 2
 
I had added some card reader/usb ports to the box and was testing if they worked so I was plugging and unplugging some devices. I hadn't learned the unmount command then but I thought it wouldn't do any harm. I've unmounted everything last night and deleted leftover directories in the /media/ folder. I haven't done any special configs relating to emails besides entering my email address during install. Don't know if I need to do anything else for proxmox to stop complaining about email.

Also since I was still just learning everything, I had created and somewhat incorrectly destroyed 2 ZFS pools which I saw proxmox trying and failing to mount at boot each time. I found and deleted the service for one of them in /etc/systemd/system/zfs-import.target.wants/zfs-importPOOLNAME.service.
I recognized that one since one of the pools was named didfferently. Howeber there's still a .service file with the name of the current/previous pool so I don't know if I can delete that as well.

Note: I just corrected the path to the .service here.
 
Last edited:
Last edited:
  • Like
Reactions: leesteken
Thank you. I might give that a try. What about cleaning up the remains of previously created zpools proxmox is trying to mount on startup? Is there a simple way to check which pools (only 1 now) are mountable and then remove others? Will I do any damage if I delete the remaining service file in /etc/zfs/zfs.import.target.wants/zfs-importPOOLNAME.service?

Thanks.
 
Thanks a lot for the help. It seems that i've successfully deleted all destroyed pools since zpool import says no pools are available for import. Hopefully that'll help.
 
So did you actually disable/delete the zfs-import@POOLNAME.service units for the non-existent ZFS pools?
Because zpool import would only list actually present/existing pools which are not mounted but ready for mount.
So zpool import can (correctly) be "empty", but in the same time you can have those old import service units of the non-existent pools active, which leads to those error messages in the logs.
This is because Proxmox creates these service units when you create a ZFS pool over the webUI, but does not disable/delete them when a ZFS pool gets destroyed/exported.

In short: zpool import is not an indicator that there are no old but still active zfs-import@POOLNAME.service units of non-existent ZFS pools left.


Another thing I saw in your logs is: pve6-usb-automount. I have no experience with it, but regarding this thread [1] shouldn't it be pve7-usb-automount for PVE 7?!

[1] https://forum.proxmox.com/threads/usb-automount.39296
 
I only deleted one of those .service units. I was afraid the other one represented the current pool which has the same name as previously destroyed one. I saw at startup just now it's trying to mount the old one too. I'll delete the other one now.
Yes, I'll update to pve7 version of the automount. Accidentally came across the old one while googling something.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!