I recently started having an issue with a single node not connecting to the other node. There were no obvious errors in any of the config files. We discovered in another thread that I could usually force it to connect by simply running the
PVE won't achieve quorum without help (even then it sometime looses connection)
SSH is VERY slow to connect via a terminal on my machine, even when using the direct IP
Opening console in PVE for a VM or the PVE often fails, showing "failed to connect to server"
Telling a VM to reboot will often close down and then not start back up, checking the logs shows
I'm sure there's other things but these are the most obvious, move inside a VM and everything seems totally normal, I can connect to remotely to the IP via terminal and the application run as expected from inside.
It's been suggest that I somehow enable verbose logging on corosync, though I'm told its VERY verbose and I'm not sure that's where I need to start.
It's as if something is causing response times, causing timeouts. Though CPU load and available memory are both healthy.
I've considered just reinstalling but having done that before, I know it can cause real issues with corosync and devices failing to connect due to keys being wrong etc.
Does anyone know a good way of debugging these kinds of issues?
corosync -f
command. I've started to notice some other things which I am pretty sure are related but what exactly is causing it isn't so obvious. The things I've noted are:PVE won't achieve quorum without help (even then it sometime looses connection)
SSH is VERY slow to connect via a terminal on my machine, even when using the direct IP
Opening console in PVE for a VM or the PVE often fails, showing "failed to connect to server"
Telling a VM to reboot will often close down and then not start back up, checking the logs shows
TASK ERROR: timeout waiting on systemd
I'm sure there's other things but these are the most obvious, move inside a VM and everything seems totally normal, I can connect to remotely to the IP via terminal and the application run as expected from inside.
It's been suggest that I somehow enable verbose logging on corosync, though I'm told its VERY verbose and I'm not sure that's where I need to start.
It's as if something is causing response times, causing timeouts. Though CPU load and available memory are both healthy.
I've considered just reinstalling but having done that before, I know it can cause real issues with corosync and devices failing to connect due to keys being wrong etc.
Does anyone know a good way of debugging these kinds of issues?