Hi everyone, I've encountered a very tricky issue with my PVE cluster and I'm looking for a way to recover without rebooting the physical host.
Environment Setup:
Current Status:

Environment Setup:
- 3-node PVE Cluster.
- Each node uses LACP (Bonding) with two NICs to handle PVE Management, VM traffic, and PBS backup traffic.
- A VM on pve3003 acts as an NFS Server, using PCIe passthrough for an SSD.
- The entire cluster (including the node itself) mounts this NFS share for ISO storage.
Current Status:
- I’ve forced unmounted the NFS paths on the nodes (umount -f -l). Now df -h works.
- However, pve3003 still shows a "?" in the WebUI and cannot be managed via other nodes.
- I've tried restarting pve-cluster, pvedaemon, and pveproxy, but it didn't help.
- Critically, any qm commands (like qm list) executed on pve3003 hang indefinitely.
- I checked dmesg and the system console, and it's flooded with "nfs: server not responding, still trying" messages along with kernel call traces related to I/O wait.

- Other VMs on pve3003 are still running, but the NFS VM is completely unresponsive. It seems the passthrough I/O or the VM process is stuck.