Hi,
I'm operating a 7-node cluster with Ceph.
From time-to-time I must notice that a node is automatically rebooting.
Now I want to analyse what is triggering this reboot.
The output of last -x | head | tac indicates that the last reboot was triggered by kernel 5.3.10-1-pve:
root@ld5506:~# last -x | head | tac
root pts/3 tmux(31958).%2 Tue Dec 10 11:04 - crash (1+17:25)
reboot system boot 5.3.10-1-pve Thu Dec 12 04:29 still running
runlevel (to lvl 5) 5.3.10-1-pve Thu Dec 12 04:33 still running
root pts/0 10.177.32.32 Thu Dec 12 08:47 still logged in
root pts/1 tmux(248633).%0 Thu Dec 12 08:47 still logged in
root pts/2 tmux(248633).%1 Thu Dec 12 08:47 still logged in
root pts/3 tmux(248633).%2 Thu Dec 12 08:51 still logged in
root pts/4 tmux(248633).%3 Thu Dec 12 10:01 still logged in
root pts/5 tmux(248633).%4 Fri Dec 13 10:19 still logged in
root pts/6 tmux(248633).%5 Fri Dec 13 10:23 still logged in
If this is true, my question is:
Why is kernel triggering a reboot?
If not, the question is:
What should I check next in order to identify the root cause?
THX
I'm operating a 7-node cluster with Ceph.
From time-to-time I must notice that a node is automatically rebooting.
Now I want to analyse what is triggering this reboot.
The output of last -x | head | tac indicates that the last reboot was triggered by kernel 5.3.10-1-pve:
root@ld5506:~# last -x | head | tac
root pts/3 tmux(31958).%2 Tue Dec 10 11:04 - crash (1+17:25)
reboot system boot 5.3.10-1-pve Thu Dec 12 04:29 still running
runlevel (to lvl 5) 5.3.10-1-pve Thu Dec 12 04:33 still running
root pts/0 10.177.32.32 Thu Dec 12 08:47 still logged in
root pts/1 tmux(248633).%0 Thu Dec 12 08:47 still logged in
root pts/2 tmux(248633).%1 Thu Dec 12 08:47 still logged in
root pts/3 tmux(248633).%2 Thu Dec 12 08:51 still logged in
root pts/4 tmux(248633).%3 Thu Dec 12 10:01 still logged in
root pts/5 tmux(248633).%4 Fri Dec 13 10:19 still logged in
root pts/6 tmux(248633).%5 Fri Dec 13 10:23 still logged in
If this is true, my question is:
Why is kernel triggering a reboot?
If not, the question is:
What should I check next in order to identify the root cause?
THX