Node2 unresponsive - stuck

Springtime

Member
Nov 18, 2024
34
5
8
I was just about to be totally happy about everything, started doing updates since I got the correct functionality of HA and Ceph... and updated Node3 without an issue from 8.3.0 to 8.3.5, one VM that was there moved back, all good.
Though cool, lets proceed with Node2, couple of more VMs moved on it, and bam. No response from the node. Can't login via browser. Doesn't display anything in updates. Can't move the VMs (although, they are working), but no VM console, no storage information on the node, also no status information on the VM... always says Loading and that's about it.
Bugs me that I can't move the machines otherwise I would just reboot it.
I can open the shell though.
So... suggestions, before I just reboot the node and hope for the best?
 
And now the Node3 is also hanging. Basically traing to open CephFS storage, say communication failure, or Summary page in the VM, Status tab is loading...
Node1 is still OK.
 
OK, found some info... when going to Datacenter and Ceph, I have:
1 clients failing to respond to capability release
1 MDSs report slow requests

Node1, I can see Ceph settings, but nothing else popups up really. And node2 and node3 don't show anything in Ceph.

EDIT: apparently, the node causing issues is Node2. However, I can't stop the MDS, it's not reacting.
 
Last edited:
Have fixed. It was my first time upgrading, so apparently, I should have upgraded all nodes, and the just rebooted one after another.
Then came 8.4, and I didn't know that dist-upgrade will also push it from 8.3.x to 8.4, which it did, so I suddenly had a mix of 8.3.0, 8.3.5 and 8.4.0... on 3 nodes. Anyway, long story short, I went just through, updated all, and rebooted one after another, none of the VMs went down. Not. Even. Once.
And Ceph is also yet again OK.