We run Proxmox clusters that are used for live event production, so they get brought up and down fairly often as they are loaded into a new venue for a show or get sent back to our shop for maintenance. The VMs have replication jobs running every 5 minutes so HA can pick up should a node crash. However, if the nodes are booting or shutting down, we frequently get a flood of "replication failed" email notifications as one is booted but the others are still booting.
Is there a way to only notify after X amount of failures? Or maybe a way to detect that a shutdown was intentional and not to notify as an error?
Is there a way to only notify after X amount of failures? Or maybe a way to detect that a shutdown was intentional and not to notify as an error?