We had the same problem occur when upgrading to 6.4 with different host machines, all using 40GB Mellanox cards. Ceph was constantly losing OSDs, VMs were not responding, etc.
We got it working by changing the Infiniband-mode from connected mode to datagram mode for the VM machines (because we...