Test LACP, 1x TrueNAS + 2x Proxmox VE nodes = LACP y u no brrrtt?

BloodyIron · Jun 14, 2024

Putting aside the silliness of the title, I do not really see why the LACP bonding in the scenario I'm about to describe does _not_ result in >1gig combined throughput.

In this scenario my Proxmox VE cluster is 2x physical nodes. Each node having 2x1gig LACP bond (layer2+3) and the ports on the switch are LACP Active (not static).

I have a test/dev TrueNAS system I built for $reasons and it's expected to be temporary, but plenty fast. It's a Dell R720 with 64GB RAM and has 2x1gig LACP in the same way, to the same switch the other PVE nodes are on. Rocking the latest TrueNAS SCALE (Linux not FreeBSD), and is using a SAS2 HBA to a SAS2 disk shelf populated with 12x600GB 10KRPM SAS2 Enterprise HDDs (I got them for this many dollaridoos $0). The zpool is striped-mirror, so all vdevs are mirrors, I believe sync is still on default, and I don't remember if atime is on/off, but generally this is default because this is just a temporary space for me to work on some dev kubernetes stuff. The TrueNAS OS is installed on a $shitCanWorthy USB stick because for this case I don't need better (so far?) as this is temporary. There are no SSD devices at-play here for this TrueNAS system.

The TrueNAS is serving an NFS export of a single dataset. Proxmox VE I've told the cluster to mount the NFS export with v4.2 because bigger numbers are obviously better (plus who knows if I get pNFS magic because $reasons). I do not recall using any special-magic mount flags but gladly will share if asked.

This TrueNAS only has the single dataset, nothing else going on. I don't recall even setting up snapshot tasks (don't need them for this current use-case)

So... the disks are in very good health, everything is great here from a hardware operational status (so far as I can tell), so I see no indications of component/device failure/errors causing problems.

I have 3x VMs that I run each as kubernetes (k8s) nodes, as in, each VM is a node in the k8s cluster.

2x of those VMs are on ONE of my Proxmox VE nodes.
1x of the other VM is on the other Proxmox VE node.

These 3x VMs are the ONLY VMdisks that are backed by this test/dev TrueNAS system, all other VMdisks and content are on another NAS (echo $longStory).

So in this test scenario, all 3x VMs are off. I turn them on all at the same time. They spin up going brrrttt and all that. However, when I monitor the metrics on the TrueNAS system, the peak tx (transmit) does NOT exceed 1gig, in fact peaks about 800Mb/s.

This performance result is the same whether the ARC on the system is fully hot (as in I've done multiple off/ons of the VMs) or if the ARC is fully cold.

So.... considering the TrueNAS is serving requests from two different Proxmox VE nodes... WHY AM I NOT ABLE TO EXCEED 1gig even in spikes???

(

Everything I read and understand is that the two PVE nodes _SHOULD_ represent multiple TCP/sessions instead of just one, which SHOULD make exceeding 1gig trivial. ESPECIALLY when the ARC is hot. And to be clear, the 64GB of RAM in the TrueNAS system is larger than the total capacity of the 3x VM disks in entirety (even though they're mostly white space) BEFORE ARC compression.

So.... what am I doing wrong here?

((

mgabriel · Jun 18, 2024

BloodyIron said:
The TrueNAS is serving an NFS export of a single dataset. Proxmox VE I've told the cluster to mount the NFS export with v4.2 because bigger numbers are obviously better (plus who knows if I get pNFS magic because $reasons). I do not recall using any special-magic mount flags but gladly will share if asked.

The Proxmox node is your only NFS client. That means your connection is *always* between the same pair of source mac, source ip, target mac and target ip (layer 2 + 3), which results in always having the same hash and always using the same single physical link of your bond.

LACP is able to use multiple physical links at the same time *for different conenctions* while one connection will always use the same physical link and therefore is limited to the bandwidth of one physical link.

BloodyIron · Jun 18, 2024

mgabriel said:
The Proxmox node is your only NFS client. That means your connection is *always* between the same pair of source mac, source ip, target mac and target ip (layer 2 + 3), which results in always having the same hash and always using the same single physical link of your bond.

LACP is able to use multiple physical links at the same time *for different conenctions* while one connection will always use the same physical link and therefore is limited to the bandwidth of one physical link.

But there are 2x PVE Nodes (physically separate computers running Proxmox VE on them), not 1x. And I was spinning up VMs on both nodes at the same time from the same storage endpoint (NFS). From what I read that should at a minimum be at least two connections.

mgabriel · Jun 18, 2024

what and how exactly do you measure?

BloodyIron · Jun 18, 2024

mgabriel said:
what and how exactly do you measure?

I measure the reported inbound and outbound traffic in bits (mbps) by TrueNAS at its level. As in, I watch what metrics it reports (TrueNAS) as the VMs spin up. So far it has not been able to exceed 1gbps in either direction (inbound/outbound).

mgabriel · Jun 18, 2024

As your TrueNAS machine also has 2x1G links, the Proxmox nodes could probably use different, but also probably use the same physical link of the two TrueNAS NICs.

If you just want to check if it works in general, try testing with iperf3 (if available on TrueNAS, not sure). Set your LACP to layer3+4 to include the port numbers in hash calculation and start multiple parallel streams on each client node to utilize different port numbers. If you start 10 streams on each Proxmox node and a iperf3 server on the TrueNAS server, you should end up seeing something close to 1,8/1,9Gbit on the TrueNAS.

BloodyIron · Jun 18, 2024

mgabriel said:
As your TrueNAS machine also has 2x1G links, the Proxmox nodes could probably use different, but also probably use the same physical link of the two TrueNAS NICs.

If you just want to check if it works in general, try testing with iperf3 (if available on TrueNAS, not sure). Set your LACP to layer3+4 to include the port numbers in hash calculation and start multiple parallel streams on each client node to utilize different port numbers. If you start 10 streams on each Proxmox node and a iperf3 server on the TrueNAS server, you should end up seeing something close to 1,8/1,9Gbit on the TrueNAS.

1. I'm not sure if my switch supports layer3+4 LACP, it's pretty crusty and the documentation for it is unclear (Avaya ERS4000, it's what I "got" for now).
2. Changing the LACP config on the two Proxmox nodes is less than ideal right now.

bbgeek17 · Jun 18, 2024

What are the IPs of your two PVE hosts? If you are unlucky enough, they both hash to the same link in the LACP channel.

A hacky way to somewhat ensure fair hashing is to use sequential IPs on your clients.

You may get more predictability if you add two sequential IPs on your LACP bond and get the NFS multipath working. I've seen people in the forum struggling with it, so if you do get NFS MP going, let us know.

Or you can use proper enterprise storage and avoid relying on chance.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

BloodyIron · Jun 25, 2024

bbgeek17 said:
What are the IPs of your two PVE hosts? If you are unlucky enough, they both hash to the same link in the LACP channel.

A hacky way to somewhat ensure fair hashing is to use sequential IPs on your clients.

You may get more predictability if you add two sequential IPs on your LACP bond and get the NFS multipath working. I've seen people in the forum struggling with it, so if you do get NFS MP going, let us know.

Or you can use proper enterprise storage and avoid relying on chance.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

1. The last octet of the relevant nodes are almost sequential, not the actual numbers but the are along the lines of .111 and .114 (adjusted for security reasons), but they are on the same CIDR so the first three octets are the same.
2. I'm of the understanding NFS v4.x multipathing doesn't do multipathing when running over LACP as I have it configured. Is that a misunderstanding?
3. "proper Enterprise storage"... that's not exactly a helpful thing to say here. In this case this is a form of development testing, regardless of whether someone classifies the method Enterprise or not is irrelevant. Enterprise is typically defined by how many people get paid to have their fingers wagged at them, not because it's magically the best thing on the planet. Let's try to be objective here as much as possible, not subjective with "Enterprise" labelling. Let me tell you, I've worked in the largest environments on the planet, "Enterprise" means little with how often bad practices are used in said environments.

As for the NFS multipathing, that's not exactly out of the picture for future development. At this time I am trying to see what I can do to improve the LACP aspects I have today. The results (not really being able to exceed 1gig aggregate bandwidth usage) do not seem to meet expectation (multiple PVE nodes being able to have an aggregate network usage spike beyond 1gig aggregate from said NAS).

Search

Search

Test LACP, 1x TrueNAS + 2x Proxmox VE nodes = LACP y u no brrrtt?

BloodyIron

Renowned Member

mgabriel

Renowned Member

BloodyIron

Renowned Member

mgabriel

Renowned Member

BloodyIron

Renowned Member

mgabriel

Renowned Member

BloodyIron

Renowned Member

bbgeek17

Distinguished Member

BloodyIron

Renowned Member