after server crash, all logs were gone

niggolas

New Member
Mar 20, 2023
5
0
1
Hey there, last week i had an incident were one of my proxmox servers crashed and rebuild automatically.
The reason why could not be cleared yet.
It was no planned restart.

My case is that happend at 03-15 6:47 AM(visible through the another server via cluster) and all the server logs started at 6:48. All the logs before that point of time are gone.
checked logs: journalctl, /var/log/messages

I checked the HDDs all fine according to smartctl.
CPU was never more than 5% occupied.
Sadly RAM had only around 7 out of 128 GiB free space.

What could be the reason why a server looses all of its logs?
Could the lack of memory be the reason?

Thanks in advance!
 
Last edited:
Hi,
what command did you use exactly to show the logs? /var/log/messages might have been rotated, check if there are /var/log/messages.1 and/or zipped messages files. Also check the content of /var/log/journal, where the systemd-journald logs are stored.
 
hey,
i used vi /var/log/messages
i also checked the older logs and zipped logs now.
I could find logs from few weeks ago, but the complete log timespan from 12. to 15. march until that server "crash" is gone.

still don't know why and what happend there?
 
Try using journalctl --since <date> --until <date>. I don't see any reason why the logs should go missing other than disk failure or manual intervention.
 
such kind of incidents, especially with an financially impact, helps to improve the environment.
think about an logserver. than this can not happen anymore and in case of regress you have your prove what happend
 
I used that command: sudo journalctl --since "2023-03-12 00:00:00" --until "2023-03-17 00:00:00"
-- Logs begin at Wed 2023-03-15 06:48:43 CET, end at Mon 2023-03-20 10:08:38 CET. --

still not showing anything before the incident of lastweek 03-15 6:48.
was the command correct? (it worked on a different server)
 
Last edited:
hey,
i used vi /var/log/messages
i also checked the older logs and zipped logs now.
I could find logs from few weeks ago, but the complete log timespan from 12. to 15. march until that server "crash" is gone.

still don't know why and what happend there?

why you are searching for ? you checked the files itself and you confirmed that the time is missing. than they are gone. look after the incident of something like "clear" job or delete is fine in the logs
 
there was only that one line with BMC to find, with the info of "clear"

Code:
XXX@pmoc2:~$ sudo journalctl | grep clear
Mar 15 06:48:44 pmoc2 kernel: ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
Mar 20 11:16:18 pmoc2 sudo[5250]: XXX : TTY=pts/0 ; PWD=/home/XXX ; USER=root ; COMMAND=/bin/journalctl -t clear*
XXX@pmoc2:~$ sudo journalctl | grep delete
XXX@pmoc2:~$
 
why you are searching for ? you checked the files itself and you confirmed that the time is missing. than they are gone. look after the incident of something like "clear" job or delete is fine in the logs
what i am searching for is at least a hint why that happend, like why that proxmox server crashed and rebuild all the production VMs.
i do have no hint from monitoring except that RAM was nearly full with 121 GiB in use of 128 GiB.
and i do have no hint from logs, because there are none for the past three days since that happend.

those Hdds are still fine and running. no person touched the server. what could it be?
 
Last edited:
I had a very similar situation which may have been as a result of a short power failure but I have no proof for that. The logs were all gone for anything prior to the re-start time including messages* and syslog*.

The last line of journalctl before the crash was
Code:
Jun 06 11:42:56 pve smartd[623]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 70
telling me nothing about the cause.
The proxmox device was re-powered manually when it was found to be down.
Can I do something to improve my config?
 
Last edited:
I had a very similar situation which may have been as a result of a short power failure but I have no proof for that. The logs were all gone for anything prior to the re-start time including messages* and syslog*.

The last line of journalctl before the crash was
Code:
Jun 06 11:42:56 pve smartd[623]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 70
telling me nothing about the cause.
The proxmox device was re-powered manually when it was found to be down.
Can I do something to improve my config?
I don't like to hijack posts, but I had the same problem just yesterday with an unresponsive host and just this line in journalctl.
After rebooting manually everything seems fine but with no logs I'm worried.
 
Same problem here on my 8.0.4 node. Read only file system error, no /var/log/messages of kern.log. Nothing in journalctl, only logging of my hard power cycle:

Code:
Oct 06 06:25:19 frigate-nuc systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 06 06:25:19 frigate-nuc systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and c>
-- Boot 2f43176818254bbda8457d81da9f135a --
Oct 06 08:04:15 frigate-nuc kernel: microcode: microcode updated early to revision 0x26, date = 2019>
Oct 06 08:04:15 frigate-nuc kernel: Linux version 6.2.16-15-pve (build@proxmox) (gcc (Debian 12.2.0->
 
Hello! Just wanted to chime in. Same issue here on my 7.4-16 node. Last clue was SMART reporting a temperature change, then regular logs and then all of the sudden nothing up til I rebooted the node manually:

Nov 03 04:53:33 host2 audit[3620725]: SYSCALL arch=c000003e syscall=54 succes> Nov 03 04:53:33 host2 audit: PROCTITLE proctitle="ebtables-restore" -- Boot a4c7306ad8e64c8b9360e4c82d18cd40 -- Nov 23 13:12:50 host2 kernel: Linux version 5.15.116-1-pve (build@proxmox) (g> Nov 23 13:12:50 host2 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.116>

Nothing of use in any of the logs that I could see such as /var/log/messages and messages.1, /var/log/kern.log etc.

After rebooting, everything works fine but like others mentioned, the lack of log messages make me curious too.