Node went down - unclear why - log attached

lifeboy

Renowned Member
We had a node go down two days ago and I'm at a loss figuring out why.

I attached the log. This happened at 12:30. The other nodes simply show that the OSD's when down and feverishly started rebalancing the cluster.

Is there any indication as to why?

Code:
Sep  8 12:29:56 FT1-NodeA pvedaemon[4074142]: starting vnc proxy UPID:FT1-NodeA:003E2A9E:9B7AEA15:68BEB024:vncproxy:100:roland@pve:
... <cut>
Sep  8 12:29:59 FT1-NodeA pvedaemon[213253]: <roland@pve> end task UPID:FT1-NodeA:003E2A9E:9B7AEA15:68BEB024:vncproxy:100:roland@pve: OK
Sep  8 12:30:28 FT1-NodeA systemd[1]: 100.scope: Succeeded.
Sep  8 12:30:28 FT1-NodeA systemd[1]: Stopped 100.scope.
Sep  8 12:30:28 FT1-NodeA systemd[1]: 102.scope: Succeeded.
Sep  8 12:30:28 FT1-NodeA systemd[1]: Stopped 102.scope.
Sep  8 12:30:28 FT1-NodeA systemd[1]: 104.scope: Succeeded.
Sep  8 12:30:28 FT1-NodeA systemd[1]: Stopped 104.scope.
... </cut>

In the above it shows that I was logged in (roland@pve) and had closed VNC console session, when the "Stopped 100.scope" and the rest started.

What does "Stopped 100.scope" mean?

thanks for your feedback all!
 

Attachments

Last edited:
Hello,

What does "Stopped 100.scope" mean?
It shouldn't be an issue here.
The scope unit was terminated because all its processes exited. Once the last process in the scope ends, systemd automatically stops the scope.
But this part is more interesting:
Sep 8 13:54:13 FT1-NodeA pvestatd[2895]: metrics send error 'InfluxDB': 500 Can't connect to 192.168.131.202:8086 (Connection refused)
Sep 8 13:54:16 FT1-NodeA pvestatd[2895]: PBS1: error fetching datastores - 500 Can't connect to 192.168.131.199:8007 (No route to host)
Check your connection.
Try to get mode details in journal by journalctl | grep failed-node-name
 
Last edited:
Hello,


It shouldn't be an issue here.
The scope unit was terminated because all its processes exited. Once the last process in the scope ends, systemd automatically stops the scope.
But this part is more interesting:
Sep 8 13:54:13 FT1-NodeA pvestatd[2895]: metrics send error 'InfluxDB': 500 Can't connect to 192.168.131.202:8086 (Connection refused)
Sep 8 13:54:16 FT1-NodeA pvestatd[2895]: PBS1: error fetching datastores - 500 Can't connect to 192.168.131.199:8007 (No route to host)
Check your connection.
That's inconsequential. That Node was down and had started up, but PBS1 and InfluxDB were not started yet.
Try to get mode details in journal by journalctl | grep failed-node-name
I have attached all the records from journalctl between 12:00 and the completed shutdown of the Node. I don't see any reason why the Node shutdown. It looks like an orderly shutdown so me, but it wasn't initiated.
 

Attachments