Looking at it, it would seem using this link would cause issues for corosync.
You did the ping over one of the vlan networks, wich is on the bond?
You see, that the ping to NODE7 (CEPH OSD'S & VM'S) 172.16.3.27 is much higher than to an node, wich only has vm's on is. Because it has the highest latency from all the ceph nodes, i think, that this host was master, when you did the test?!
The first test, with 10000 pings is a type of stress test. you put a high load on the link and see what happens with the latency. The 10 minute test let you know what happens over time. Here the package lost is more relevant. Even when there the latancy is good, it may come in trouble when there is load on the network. At the moment ceph is working normal, no recoyery or such things?!
Your network connection and switches are not bad, but you see that the latancy spikes (which are more than 7 ms) which are only based on the not seperated networks. Because of this also the nodes with ceph have a higher latency to all other nodes, and all other nodes to them. (Even you will not saturate the 20GBit/s, you can see the effect wich I sad conerning higher latency when you put load on the network)
Have you connected the 1GBit/s seperate link for corosync already? You can do that test over the IP Adresses on that seperate interface and should see more or less the same min / avg but not so high spikes at max..
in terms of Network cards, as mentioned we have Mellanox Connect X4 cards. Do you think these are ok? any other one that is battle-tested beyond belief to consider?
We are using three Mellanox CX354A ConnectX-3 (2 Port 40GBit/s) in each node and had no problems at all. I do not know if the difference is big, but yours should be newer and if the old ones are supported perfectly.. Why should yours not work. We have not changed the network driver. (Maybe reading through the discussions about the network drivers for mellanox) Have you checked the firmware for the cards?
Have you connected them via SFP or RJ45? If RJ45 which cables are you using (CAT5e, CAT6, CAT6a, CAT7?)
What you have proposed is not too far off what i was thinking myself.
Great. It could make your life much easier. When you set up something like this, I would recomand not to bond the 2 links on a physical card, instead using one link on one card and one link of the other card bonded for public/vm traffic and the other 2 for ceph. Then you have even 1 link of the lacp left if an entire card fails. Otherwise vm or ceph has no links if one of the network cards fails.
You can also use (if enought ports) your iDrac/IPMI switches for one of the 2 corosync links. When the switch is powerfull enough it is not absolutely neccesary to have an dedicated one. It should only have a seperate physical link from your nodes to the switch, could even be a vlan with an seperate subnet where only the corosync traffic is running. It makes a different if you have a physical link (also the LACP is 1 link) with 3 vlans/subnets on it (like now) or you have 3 seperate physical links with 1 vlan/subnet on it. When the switches are powerfull enough you could have the iDrac/IPMI in one vlan over 1 physical interface and the corosync traffic on one vlan over a 2nd seperate physical link on the same switch. then you maybe need just 1 more switch to be redundant. Give it a try and test it with omping like you did already. It is just essential that the switch does not give any packages from other networks (iDrac/IPMI) to the corosync network and vice versa.
Most of our hosts are Dell R820's at the moment (4 sockets). We are in the process of changing over to newer 2 socket ones at which point i think we will have to get support on this.
Concerning Hardware you can look at
Thomas Krenn. They have systems optimised for proxmox/ceph. You have not to buy them there, but they also give a detailed list which hareware is used in there systems wich gives you a good orientation e.g. which network card or hba or what ever is good to use. They build tons of systems and they have a lot of experience with proxmox/ceph and also other cool open source solutions. Also reading the benchmark papers from proxmox gives you an orientation.
I think min. you will need a community support subscription for the enterprise repo. If you have some concrete idea how the new nodes should look like and you have list wich hardeware you want to use, you could also send and e-mail to proxmox, with the request to have a short look on your specific hardware. They are all really good guys and I think that they will have a comment on this.
If you want to run this system you should also think about an training. They are excellent and you could also place your questions there, if there are not already answerd in the training. Espessially now, where you got some experience on the good old trail-and-error methode this will push your skills.
https://www.proxmox.com/en/training/pve-bundle At the moment there are also online (concerning COVID-19) so you could take part from every..