Using SR-IOV for separate VM Traffic, Ceph and Corosync networks?

victorhooi

Well-Known Member
Apr 3, 2018
251
20
58
38
We have a 3-node HA cluster running Proxmox/Ceph.

I know the current recommendation is to have separate physical network ports for VM traffic, Ceph and Corosync traffic.

We currently do this with a 4-port SFP+ NIC (Intel X710-DA4).

However, we're looking at moving to 100 Gbps. The NIC in the new server though only has a single 100 Gpbs port.

So if we have a single 100 Gbps connection between each VM node - what is the best way to split it up, and provide guarantees on bandwidth, even under contention.

I asked on r/networking, and they suggested looking into SR-IOV to split up the NIC into virtual cards.

I saw on the wiki there's a mention of it here:

https://pve.proxmox.com/wiki/PCI(e)_Passthrough#_sr_iov

However, that seems more focused on passing through virtual functions into individual VMs.

Is there some guide, or advice, or has anybody had experience with using separate VFs for VM traffic/Ceph/Corosync?

Is this the best way to do it, or is there another way?

Also - why are there two pages on PCI Passthrough - here and here
 
The recommendation for separating the various networks physically comes from the fact that they all have different needs.
E.g. corosync is low on bandwidth, but needs very low latency - if you mix that with your storage-traffic (which for e.g. NFS needs quite a bit of bandwidth) it happens that the corosync packets get delayed to a point where the cluster loses quorum.

If there is enough bandwidth and low-enough latency for all needs over one link (depending on your needs, ceph setup, number of VMs,.. 1x100G could be enough) - you can run all of them over one link as well. - However should your cluster become unstable, or ceph perform sub-par the first recommendation you'll get is to physically separate the networks.

As for using SR-IOV for separation of the networks... I don't see any upside for this use-case compared to e.g. using VLANs. - The traffic still travels over the same link so any congestion will affect the all VFs. The important part IMO is to use a separate network (as in IP-subnet) for all networks - that should make it easier to separate them if the need arises.

The upside of SR-IOV is as you mentioned of course that they represent a PCI-device each, which you can pass through to a VM (meaning no overhead for virtualized NICs and the bridge they are connected to)

As for the 2 pages:
The first one is a rendering of the reference documentation (https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_pci_passthrough) and should be considered more recent and accurate, vs. the second is the wiki-page where users compiled their experiences.

If in doubt follow the first.

I hope this helps!
 
Is there any way to split up the VM traffic, Corosync and Ceph networks onto separate VLANs from Proxmox?

(I couldn't seem to find anything in the Proxmox GUI about creating a new interface with a specific VLAN tagging, but not sure if I missed it somewhere).

I could then potentially apply QoS at the switch lever, by VLAN, if I had any issues over the 1x100g link.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!