Hello, im struggling to determine why my device that runs proxmox started to suddenly reboot at random times.
SYS logs have no explanation that would lead me on to something. Where else can i look? What else can i do?
I started to notice that yesterday, who knows when it really started to happen.
This morning i disabled postfix as i usually do, because i dont have email configured, and it spams logs.
Then it was working good, i logged in to check if it was running, then went to other room and then -- Reboot --
I run this on brand new (2month old) protectli VP4650. I run many docker lxc containers. it was working fine last week. I didnt do any proxmox upgrades or anything like that. It was running 24h as it should because its my firewall.
I did upgrade from system menu in proxmox yesterday. And my web ui is at 8.1.4 version.
I dont want to think that it is a hardware failure ... its just 2 months old. Is there a way i could find it out? Im not super knowledgeable in linux management. I would really appreciate any help.
Could it be that some eth device that is connected to the firewall device that runs proxmox is crashing it? I got some old tplink router connected to it for wifi.
My observations on cpu/memory usage - cpu stays below 1% and memory less than 4/32gb used. It crashed during night too. How i determined that? Logging into webui and check uptime on the node. .
My only options that i can do is:
1)memtest, as im suspicious of the cheap kingston ram, but it will have to be ran for hours..
2)nuke system, clean install i was able to backup my containers yesterday but if the problem is container related then ill be bringing the bad juju on the clean install
3) idk what else to do...
SYS logs have no explanation that would lead me on to something. Where else can i look? What else can i do?
I started to notice that yesterday, who knows when it really started to happen.
This morning i disabled postfix as i usually do, because i dont have email configured, and it spams logs.
Then it was working good, i logged in to check if it was running, then went to other room and then -- Reboot --
I run this on brand new (2month old) protectli VP4650. I run many docker lxc containers. it was working fine last week. I didnt do any proxmox upgrades or anything like that. It was running 24h as it should because its my firewall.
I did upgrade from system menu in proxmox yesterday. And my web ui is at 8.1.4 version.
Code:
Mar 05 06:53:47 firewall-boi pvedaemon[983]: <root@pam> successful auth for user 'root@pam'
Mar 05 06:54:40 firewall-boi pvedaemon[984]: <root@pam> starting task UPID:firewall-boi:000169C8:000FB6FF:65E6A590:srvstop:postfix:root@pam:
Mar 05 06:54:40 firewall-boi pvedaemon[92616]: stopping service postfix: UPID:firewall-boi:000169C8:000FB6FF:65E6A590:srvstop:postfix:root@pam:
Mar 05 06:54:40 firewall-boi systemd[1]: postfix.service: Deactivated successfully.
Mar 05 06:54:40 firewall-boi systemd[1]: Stopped postfix.service - Postfix Mail Transport Agent.
Mar 05 06:54:40 firewall-boi systemd[1]: Stopping postfix@-.service - Postfix Mail Transport Agent (instance -)...
Mar 05 06:54:40 firewall-boi pvedaemon[984]: <root@pam> end task UPID:firewall-boi:000169C8:000FB6FF:65E6A590:srvstop:postfix:root@pam: OK
Mar 05 06:54:40 firewall-boi postfix[92619]: Postfix is using backwards-compatible default settings
Mar 05 06:54:40 firewall-boi postfix[92619]: See http://www.postfix.org/COMPATIBILITY_README.html for details
Mar 05 06:54:40 firewall-boi postfix[92619]: To disable backwards compatibility use "postconf compatibility_level=3.6" and "postfix reload"
Mar 05 06:54:40 firewall-boi postfix/postfix-script[92625]: stopping the Postfix mail system
Mar 05 06:54:40 firewall-boi postfix/master[938]: terminating on signal 15
Mar 05 06:54:40 firewall-boi systemd[1]: postfix@-.service: Deactivated successfully.
Mar 05 06:54:40 firewall-boi systemd[1]: Stopped postfix@-.service - Postfix Mail Transport Agent (instance -).
Mar 05 07:17:01 firewall-boi CRON[103549]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 05 07:17:01 firewall-boi CRON[103550]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Mar 05 07:17:01 firewall-boi CRON[103549]: pam_unix(cron:session): session closed for user root
Mar 05 07:29:45 firewall-boi pvedaemon[985]: <root@pam> successful auth for user 'root@pam'
Mar 05 07:48:42 firewall-boi systemd[1]: Starting apt-daily.service - Daily apt download activities...
Mar 05 07:48:42 firewall-boi systemd[1]: apt-daily.service: Deactivated successfully.
Mar 05 07:48:42 firewall-boi systemd[1]: Finished apt-daily.service - Daily apt download activities.
Mar 05 07:54:15 firewall-boi pvedaemon[983]: <root@pam> successful auth for user 'root@pam'
-- Reboot --
Mar 05 07:57:08 firewall-boi kernel: Linux version 6.5.13-1-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-1 (2024-02-05T13:50Z) ()
Mar 05 07:57:08 firewall-boi kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.13-1-pve root=/dev/mapper/pve-root ro quiet
Mar 05 07:57:08 firewall-boi kernel: KERNEL supported cpus:
Mar 05 07:57:08 firewall-boi kernel: Intel GenuineIntel
Mar 05 07:57:08 firewall-boi kernel: AMD AuthenticAMD
Mar 05 07:57:08 firewall-boi kernel: Hygon HygonGenuine
Mar 05 07:57:08 firewall-boi kernel: Centaur CentaurHauls
Mar 05 07:57:08 firewall-boi kernel: zhaoxin Shanghai
Mar 05 07:57:08 firewall-boi kernel: BIOS-provided physical RAM map:
Mar 05 07:57:08 firewall-boi kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
Mar 05 07:57:08 firewall-boi kernel: BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
Mar 05 07:57:08 firewall-boi kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
Mar 05 07:57:08 firewall-boi kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000097672fff] usable
Mar 05 07:57:08 firewall-boi kernel: BIOS-e820: [mem 0x0000000097673000-0x0000000097673fff] reserved
Mar 05 07:57:08 firewall-boi kernel: BIOS-e820: [mem 0x0000000097674000-0x000000009767afff] usable
I dont want to think that it is a hardware failure ... its just 2 months old. Is there a way i could find it out? Im not super knowledgeable in linux management. I would really appreciate any help.
Could it be that some eth device that is connected to the firewall device that runs proxmox is crashing it? I got some old tplink router connected to it for wifi.
My observations on cpu/memory usage - cpu stays below 1% and memory less than 4/32gb used. It crashed during night too. How i determined that? Logging into webui and check uptime on the node. .
My only options that i can do is:
1)memtest, as im suspicious of the cheap kingston ram, but it will have to be ran for hours..
2)nuke system, clean install i was able to backup my containers yesterday but if the problem is container related then ill be bringing the bad juju on the clean install
3) idk what else to do...
Last edited: