All these node connect to CEPH storage (for VM disk) and NFS v3 (for backup) only.
Our backup server got problem with raid array and it timeout (service pvestatd status) in the past few days. Now it’s up after rebuilding raid array.
We will monitor again.
Honestly something strange with this cluster. I have setup 5 cluster and only this cluster has problems. (Eg: migration VM and stuck at ssh key, remove know_hosts then it ok but another host got problem. This is already mentioned in this thread)