Good morning, dear Proxmox community.
I am facing a peculiar problem that I cannot identify. First, let me share a few facts:
- I operate four servers in a cluster with CEPH and 16 NVMe OSDs.
- I use NFS as backup storage.
- All VMs from the four servers are backed up to the NFS storage.
- Each host has a public 10 Gbit network card and an internal one. The backups, Ceph, and cluster communication run on the non-public network card.
About two weeks ago, suddenly, the VMs from one of the hosts started experiencing massive connection problems at night during the backups. Initially, I couldn't find anything unusual in the monitoring, except that the network connection frequently timed out during this period. Consequently, I expanded the monitoring and switched to LibreNMS. However, the discoveries made there confuse me even more: During the backup, the traffic is visible on the respective VM, i.e., on the VM's network adapter. As a result, the network adapter is 100% utilized, and the other services are unable to connect to the public network. But how can this be?
Can you perhaps help me? For me, it's mysterious I would like to thank you in advance for your help.
Kind regards,
Phil
I am facing a peculiar problem that I cannot identify. First, let me share a few facts:
- I operate four servers in a cluster with CEPH and 16 NVMe OSDs.
- I use NFS as backup storage.
- All VMs from the four servers are backed up to the NFS storage.
- Each host has a public 10 Gbit network card and an internal one. The backups, Ceph, and cluster communication run on the non-public network card.
About two weeks ago, suddenly, the VMs from one of the hosts started experiencing massive connection problems at night during the backups. Initially, I couldn't find anything unusual in the monitoring, except that the network connection frequently timed out during this period. Consequently, I expanded the monitoring and switched to LibreNMS. However, the discoveries made there confuse me even more: During the backup, the traffic is visible on the respective VM, i.e., on the VM's network adapter. As a result, the network adapter is 100% utilized, and the other services are unable to connect to the public network. But how can this be?
Can you perhaps help me? For me, it's mysterious I would like to thank you in advance for your help.
Kind regards,
Phil