As a rule, you should have your hypervisor monitored. You can detect a fundamentally increased load on a node and be alerted. For example, you can also use checkmk to monitor the interfaces of the VMs and thus determine increased bandwidth utilization. If you then have the metrics from PVE pushed directly to Graphite or InfluxDB, you can set an alert there and, for example, determine if a VM has a high CPU utilization over a longer period of time.
In addition, you should familiarize yourself with the limits in Proxmox VE and set them. For example, in a 1 core VM you should set the CPU limit to 0.95, this also ensures that no more than 95% CPU usage can be generated in the VM, but then the QEMU process on the node does not run at 110 - 120%. In addition to the restrictions on the VM itself, you should also make sure that the node also has certain limits and that the processes represent a burden even beyond the limits. You should also work with the storage limits, but you can also set them higher; these should just prevent a VM from putting so much strain on your hypervisor that it causes an impairment.
I wrote my own WHMCS module because the existing ones could never meet my sense of security. My hypervisor is in the internal network, the customer center gets API access in the background to carry out the actions, access to the noVNC console is controlled and routed via a HAProxy, which also validates the connection. These modules usually can't handle such things and that's exactly what bothers me.