Zabbix as VM looses connection

kenneth_vkd

Renowned Member
Sep 13, 2017
40
3
73
32
Hi
For some time now, we have had a strange issue with the VM running our Zabbix monitoring server.
Zabbix runs as a VM on our PVE cluster and monitors all running virtual machines on the cluster. Zabbix VM is running CentOS 7 as the guest OS.
Whenever a backup of this VM is triggered, it seems to loose connectivity to all virtual machines across all nodes and thus kicking off 1000+ email alerts.
This behavior has been seen on backups prior to the new Proxmox Backup Server as well as after implementing the Proxmox Backup Server solution.
The Zabbix VM has been located on various different hosts throughout its lifecycle.
The VM also remains responsive during the backup

What we have found is that whenever the backup job executes "freeze" or "thaw" of the VM disk, The freeze seems to trigger connectivity issues and thaw restores it. We then thought that it had something to do with maybe a conflict with the qemu guest agent, so we disabled it on the vm configuration in Proxmox and disabled the related services in the guest OS. Still same result.

Has anyone else experienced similar issues

For now we have configured a manitenance window in Zabbix that roughly matches the window in which backup of this specific VM runs
 
Has anyone else experienced similar issues
Not really - but can imagine how they happen:
between fsfreeze and fsthaw the machine cannot write to it's disks (depending on the filesystems it might even sync the filesystem journal, which can take some amount of time) - the question is why it takes so long that a monitoring system starts alarming (in my experience they send out alarms after minutes, not after (mili)seconds).

Please share the VM's configuration (and describe on what storage it's running), the task-log of such a backup, and the journal inside the VM - maybe we can see something which explains the behavior