Hi everybody. I am new to PVE. My company is using KVM on individual servers and is painful to manage it. About few months ago I have started installing PVE hosts and moving VMs to the new environment. I have at this moment 9 hosts in one rack, connected to a TOR Mellanox switch pair using MLAG, the hosts are using Mellanox Connectx-5 interfaces. I have set the networking on all switches with a bond between the two 100G ports and vlans with bridges . One vlan/bridge for management with an IP address, one vlan/bridge for the storage network ( also with IP address) and three other VLAN/bridges for the servers ( VMs) , without an IP address.
I am using two storage devices, one Purestorage Flashblade S2 for the VMs and containers and one Synology for backup and ISOs. Both exporting NFS shares. The Pure one is also exporting some shares with home directories and others that are mounted inside the VMs.
Last week I was planning on running some updates, so I started migrating the VMs from the first host to be able to reboot it after the update. Several times it refused to migrate but eventually I did that. Then... I have seen the issue. Starting about 10 days before I see in the journal errors about not having the shares mounted. Looks like they do mount and then unmount from time to time. I started pinging the storages. On both, I see now and then dropped packages, sometime 5 in a row, sometimes 25. Sometimes it works for over 100 pings with no error. I checked all the servers, and there is a second one doing the same. The rest are working fine.
To make it even more strange, I can ping without dropped packages other servers, on different vlans and even the interfaces on the storage vlan of the other hosts.
So... I think I can rule out a physical connection issue on the host or an issue on the storage.
Running netstat -i is not showing any dropped packages on any interface, bridge or vlan.
I am also seeing in the journal sometime corosync having one member leaving and then joining again.
If anybody can point me in the right direction to solve this issue would be great. I was the one pushing the PVE sollution in the company...
Thanks,
Mugurel
I am using two storage devices, one Purestorage Flashblade S2 for the VMs and containers and one Synology for backup and ISOs. Both exporting NFS shares. The Pure one is also exporting some shares with home directories and others that are mounted inside the VMs.
Last week I was planning on running some updates, so I started migrating the VMs from the first host to be able to reboot it after the update. Several times it refused to migrate but eventually I did that. Then... I have seen the issue. Starting about 10 days before I see in the journal errors about not having the shares mounted. Looks like they do mount and then unmount from time to time. I started pinging the storages. On both, I see now and then dropped packages, sometime 5 in a row, sometimes 25. Sometimes it works for over 100 pings with no error. I checked all the servers, and there is a second one doing the same. The rest are working fine.
To make it even more strange, I can ping without dropped packages other servers, on different vlans and even the interfaces on the storage vlan of the other hosts.
So... I think I can rule out a physical connection issue on the host or an issue on the storage.
Running netstat -i is not showing any dropped packages on any interface, bridge or vlan.
I am also seeing in the journal sometime corosync having one member leaving and then joining again.
If anybody can point me in the right direction to solve this issue would be great. I was the one pushing the PVE sollution in the company...
Thanks,
Mugurel