Hi,
I have a PVE Cluster with 5 nodes. One had tonight a very slow disk IO on the root disk. This slowed down the corosync until the Volume wasn't accessible anymore on any node. (The folder was still there, but for example a ls never returned a result). The problem then was, that there were multiple tasks waiting for responses from the cluster storage. This resulted in a lockup and then a full crash of all nodes in the cluster. Is there any solution to prevent this from happen again in the future?
I have a PVE Cluster with 5 nodes. One had tonight a very slow disk IO on the root disk. This slowed down the corosync until the Volume wasn't accessible anymore on any node. (The folder was still there, but for example a ls never returned a result). The problem then was, that there were multiple tasks waiting for responses from the cluster storage. This resulted in a lockup and then a full crash of all nodes in the cluster. Is there any solution to prevent this from happen again in the future?