Looking for a "cheap" way to upgrade to 40gibt

The next issue I'm facing is that, with 10gbe, I am running 3 networks: Corosync, Ceph and backup. But with the Infiniband cards, I only have two ports per node. The first network I am going to replace is backup. But I then need to decide what to do with the other two networks.

- Only switch one to Infiniband and keep the other on 10gbe? Ideally, I would like to get rid of the 10gbe network, as it is (also) a big power draw.
- Only switch one to Infiniband and switch the other over to normal 1gbe ethernet? The one to go one 1gbe probably would be Corosync. But not sure if this could be problematic (it used to work for me back when I started and everything was on 1gbe).
- Switch both to Infiniband by using vlans (if vlans exist on IB - no idea)? If that is possible, which two should I put on one cable (because they would cannibalize each other the least)?

Thanks!
 
Switch firmware wasnt that difficult, id say the hardest part was finding it in the nvidia website:
https://network.nvidia.com/support/firmware/lenovo-archive/

heres the firmware upgrade chart:
View attachment 67115


Just follow the chart and it should be fine, it is fairly time consuming so i just updated, stepped away for an hour, came back, rebooted, then updated again till i was fully updated.

you will be wanting the PPC version as these are Power PC processors. if you do it local which is what i did you will get a "loading" icon for about 10 minutes before it actually does stuff, so just something to be aware of. its doing what its supposed to, just walk away and let it try its best.
View attachment 67116
So, the SX6036 I received is on 3.6.8008. This means that my switch can do QSFP56 already (I don't have the cables for that atm -- they cost an arm and a leg, so this is only theoretical), right?

(The most current version I could find, seems to 3.6.8012 (which is from end of 2019). I will try and install it.)
 
the IB switches do not come with ethernet capability by default and only do infiniband out of box. you can get an ethernet license but i havnt found it to be all that helpful and i just instead use IPoIB which allows me to use IB how i want it as well as standard ethernet traffic without having to buy a license.
Not sure I understand this fully.

Does that mean you can use the IB cards and switch like you would otherwise use ethernet cards? But you can't use the switch with ethernet cards and cables directly, right? There seem to be splitter cables that connect to the SX3036 (or other IB switches) and that have 10gbe on the other end that one can connect to ethernet cards. But that probably doesn't work without a license, right?

If I would need a license, do you know how much that costs (probably enterprise prices...) and whether one can still get one for an obsolete switch?

Thanks!
 
The one to go one 1gbe probably would be Corosync.
Something would have to be seriously wrong for that to not be good enough. The corosync network seems to be more about latency ("anything under 5ms is good enough") to ensure things are up, plus transfer small amounts of data for replicating the /etc/pve structure (actually a small SQLite database).
 
IPoIB is a virtual ethernet protocol that you can run over Infiniband. It means the cards and switches themselves still operate in standard Infiniband mode, but your OS has a virtual ethernet nic for talking tcp/ip protocols over. It'll need to talk to a similar IPoIB nic on your other hosts also connected to the IB network.



The SX switch licenses are a different thing, and not something I've personally touched so can't help with. ;)
 
Oh, and when you use IPoIB mode, ensure you do so in "connected" mode rather than datagram mode. Massive performance difference. With connected mode, the cards establish a direct channel between each other and pass data directly into each other's memory space (I'm simplifying) with no real processing overhead.
 
Last edited:
Oh, and when you use IPoIB mode, ensure you do so in "connected" mode rather than datagram mode. Massive performance difference. With connected mode, the cards establish a direct channel between each other and pass data directly into each other's memory space (I'm simplifying) with no real processing overhead.
Thanks for the tip! ATM, I'm not even sure I need IPoIB mode. All I want is setup to be easy, as I am totally new to IB.

It would certainly be easy for me, if everything behaved like my current ethernet network. Because I have no idea how to think about the whole network, if the nodes don't have, say, IP addresses etc.

Assuming I do need IPoIB, where do I turn on "connected" mode? Separately, on each card and on the switch or just on the switch or just on the cards or ...? For the cards, is there a config file somewhere I need to edit?

Sorry for all the questions, but I literally don't know what I am doing here :oops:
 
From memory (it's been years) connected mode is a setting you do on each card.

Back in the day, it used to be a case of installing the mellanox card, then installing the Mellanox software (OFED), then changing the settings in the appropriate config file. It's still probably somewhat similar.

Oh, and if your switch doesn't have the IB management software built in, don't get stressed. There's a software package that lets you run it on your host system instead and it'll work just the same. There only needs to be one copy of the management software running in the network, but it doesn't hurt to have it run on them all. The management software will notice the other running copies, pick a leader, and the non-leader ones automatically will go into failover mode (aka take over if the leader dies somehow).
 
Last edited:
With IB, the devices have IB addresses instead of IP addresses. IB addresses are much longer than the IP things you're used to seeing.

And you don't have to use IPoIB. The purpose of IPoIB is for running software that only knows how to talk tcp/ip (which is most software). If you happen to be using software that does understand IB though, then IPoIB isn't needed.

Chances are though... you're probably not using only software that understands IB. ;)
 
Not sure I understand this fully.

Does that mean you can use the IB cards and switch like you would otherwise use ethernet cards? But you can't use the switch with ethernet cards and cables directly, right? There seem to be splitter cables that connect to the SX3036 (or other IB switches) and that have 10gbe on the other end that one can connect to ethernet cards. But that probably doesn't work without a license, right?

If I would need a license, do you know how much that costs (probably enterprise prices...) and whether one can still get one for an obsolete switch?

Thanks!
as long as its just infiniband traffic you can use a breakout cable that takes one 56/40gb connection and breaks it out to 4x 10gb connections, but this will not work for Ethernet devices unless you have the license for it. unfortunately this is not available anymore.

Justinclift mentioned datagram vs connected mode, with the sx6036 your going to be stuck using datagram mode, connected is not an option unless you direct connect the ib cards to one another. the performance isnt much of a loss, with connections at 56gbps your gonna maybe loose 6gbps leaving you with 50gbps to play with.

for the switch firmware, 3.6.8010 will be the latest version. https://www.mellanox.com/downloads/Software/image-PPC_M460EX-3.6.8010.img
 
Im not sure about the entire range, i just know this particular switch doesnt, it does a max of 4k mtu vs the 65k mtu for connected. I think its a little under powered with its 3 core power pc processor
 
connected modes also been phased out on the latest cards in favor for datagram which doesn't have as much raw throughput on the cx3 cards, but it has much better latency. on the newer stuff its a nonissue.
 
Something would have to be seriously wrong for that to not be good enough. The corosync network seems to be more about latency ("anything under 5ms is good enough") to ensure things are up, plus transfer small amounts of data for replicating the /etc/pve structure (actually a small SQLite database).
I have installed the new IB cards in all hosts but have not yet made the switch because of one (hopefully final) question that came to my mind: When I migrate VMs from one host to another, which of the networks is used for that? If it is the corosync network, then switching it back to 1gbe will drastically increase migration times.

So I'm wondering whether migration happens via the corosync network or via the Ceph network. Given that one doesn't have to use Ceph but still should be able to migrate one's VMs, I'm gusessing migration would need to use the corosync network (or the normal admin network). And if that is the case, I'm reluctant to downgrade corosync to 1gbe. (But I also don't want to downgrade Ceph or the backup network). And so I'm stuck...
 
You can tell Proxmox which network to use for migrations. It's under DatacenterOptionsMigration Settings. :)

This is the setup for a test cluster here which has 10GbE for replication and migration traffic. The other traffic uses the public 1GbE interface:

1722802251615.png
 
Last edited:
  • Like
Reactions: proxwolfe
Ah, thanks. I think I knew that a long time ago (because I had actually switched this to one of the 10gbe networks) but I had since forgotten about it...
 
So, I have made the step and connected everything but hit a wall:

While I can see the cards in the PVE GUI and have assigned them IP addresses and they start up, they can't connect to one another via the switch. At first I thought that I needed to change some settings on the switch (and that may still be true) but then I found that the cards show "<NO-CARRIER>" as part of their output to "ip a".

This made me realize that I still don't understand IB enough.

My SX6036 switch tells me its ports are IB. I don't think it has an ethernet license (and it seems you can't get the anymore anyway). So I need to either use IPoIB or IB directly.

And you don't have to use IPoIB. The purpose of IPoIB is for running software that only knows how to talk tcp/ip (which is most software). If you happen to be using software that does understand IB though, then IPoIB isn't needed.

Chances are though... you're probably not using only software that understands IB. ;)
I want to connect three PVE nodes with each another (port 1) and with a PBS for backup (port 2). So to me the question is whether PVE and PBS understand IB.

If not: How do I use IPoIB? If I understand correctly, "connected mode" is not available on my switch but IPoIB would also work without it, right?

Do I need to switch my cx3 pro NICs to a special mode for this to work? (The fact that PVE lets me assign IP addresses (as opposed to IB addresses) seems to indicate that the NICs are recognized as IP cards and would not need switching?)

And do I need to bring the ports on my switch up individually? The switch GUI shows that the ports with cables have cables but that they are "down"/"polling". So, do I need to "up" a port before it will connect?

Thanks!
 
Last edited:
If I understand correctly, "connected mode" is not available on my switch but IPoIB would also work without it, right?
In theory, yes. Bear in mind that it's been a bunch of years since I last used Infiniband, and I was never super in depth with it. ;)

Do I need to switch my cx3 pro NICs to a special mode for this to work?
Yeah, they'll need to be in Infiniband mode rather than being in Ethernet mode.

The fact that PVE lets me assign IP addresses (as opposed to IB addresses) seems to indicate that the NICs are recognized as IP cards and would not need switching?
That might actually be a bad sign, as it might mean your cx3 adapters are in ethernet mode rather than infiniband mode. If they're in Ethernet mode, then they'll happily talk to other adapters in Ethernet mode. But they won't be able to communicate to (nor over) an Infiniband switch.

As a diagnostic step for now, try hooking your PVE nodes directly to each other (through these cards) without using the switch. If they can see each other directly (ie ping works, ssh probably will too) then it sounds to me like the cards are in Ethernet mode.

I'm trying to remember how to check which mode cx3 cards are in on Proxmox. There's a command for it.



k, I don't have it documented personally (just checked), and it's not in the command history of any of my running servers, so this is going to be "point you in the right direction" kind of thing rather than being able to give the exact commands.

Anyway, the Debian package mstflint is (I'm pretty sure) the one you want. It has the cli tools for adjusting Mellanox card firmware (ie updating, changing), and it contains the mstconfig program which I'm "pretty sure" is the one used to tell ConnectX cards what mode to operate in by default.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!