mellanox woes.. can you help?

dankkster

New Member
Jan 24, 2023
23
1
3
I have 2 Mellanox ConnectX-3 cards running in my 3rd ProxMox 7.4-3 server and my UnRAID 6.11.5 server. Both have identical firmware. In this ProxMox server, I have installed TrueNAS Scale 22.12.2 as a VM as many people do for the NAS capabilities it provides and I have been relatively happy with this setup. HOWEVER.... I am unable to get these Mellanox cards to even so much as PING each other. This has to be user error but I just don't know where to troubleshoot beyond what I have done already.

To the best of my knowledge and a great deal of reading and inference, they are both in ETH mode as they are capable of ib/eth and I dont care about ib mode at all as i don't have the infrastructure for it.

In ProxMox and UnRAID, I have these cards seen and their status is UP.. Green blinky lights on the cards and everything as they do have a direct connection (no switch / no gateway) to each other. From ProxMox, I can PING the local Mellanox NIC, but not the card it is directly connected to on the UnRAID server . The same is true from UnRAID going the opposite direction. This is beyond frustrating because I dont know where to look for log entries for potential issues beyond dmesg, syslog, and kern.log.

How can these cards be connected directly, have the same MTU, the same subnet/mask. Also physically show they are connected and I cant PING either of them from the other???

In ProxMox, I have done the following (PVE3):

image_2023-06-19_223819658.png

image_2023-06-19_224145642.png

'enp1so' is the only port I currently care about. 'enp6s0' is my regular NIC that works fine.

For whatever reason, this setup is not correct and I would love to know where I have gone wrong.

The TrueNAS server does see the added NIC after I add the bridged NIC to the TrueNAS VM in ProxMox as seen below (ens19):

1687229630098.png

TrueNAS is getting ahead of the situation currently since I cant even PING between these cards via ProxMox/UnRAID, muchless TrueNAS/UnRAID. I have burned up a fair amount of time to warrant a post imho. I do appreciate any help I can get as this should not be that difficult I wouldn't think.
 
Last edited:
Have you tried without any VLANs as well?
Which IP does your Unraid use? The VLAN is configured correctly on your Unraid?

Could you provide the routes of both hosts?
ip r
 
Have you tried without any VLANs as well?
Which IP does your Unraid use? The VLAN is configured correctly on your Unraid?

Could you provide the routes of both hosts?
ip r

Without a VLAN would put it in my main network and I dont know if that would work either due to it not even being able to ping directly connected. I dont see where that would matter?

The ProxMox server and apparently TrueNAS is using 192.168.155.2/29. Im not sure if this is a problem, but it looks like it should be.
The UnRAID server uses 192.168.155.3/29 and 192.168.155.4/29 - The latter is not connected currently but does have the static ip assigned.

ProxMox:

root@pve3:~# ip r
default via 192.168.150.1 dev vmbr0.150 proto kernel onlink
192.168.150.0/24 dev vmbr0.150 proto kernel scope link src 192.168.150.30
192.168.155.0/29 dev vmbr1.155 proto kernel scope link src 192.168.155.2

TrueNAS VM:

admin@truenas[~]$ ip r
default via 192.168.160.1 dev ens18 proto static
192.168.155.0/29 dev ens19 proto kernel scope link src 192.168.155.2
192.168.160.0/24 dev ens18 proto kernel scope link src 192.168.160.50

UnRAID:

root@DeddSpace:~# ip r
default via 192.168.1.1 dev br0
10.253.0.2 dev wg0 scope link
10.253.0.3 dev wg0 scope link
10.253.0.4 dev wg0 scope link
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.19.0.0/16 dev br-c9c628169063 proto kernel scope link src 172.19.0.1
192.168.1.0/25 dev shim-br0 scope link
192.168.1.0/24 dev br0 proto kernel scope link src 192.168.1.50
192.168.1.128/25 dev shim-br0 scope link
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
192.168.155.0/29 dev br1 proto kernel scope link src 192.168.155.3 <--- This is the only connected port for the Mellanox ConnectX-3
192.168.155.0/29 dev br2 proto kernel scope link src 192.168.155.4 <----- Disconnected

Many thanks.
 
Both the host and the TrueNAS VM use the same IP?
Don't do that!
 
Both the host and the TrueNAS VM use the same IP?
Don't do that!
I reassigned a different ip to proxmox and I am able to ping both of these IPs locally, however I am still unable to ping the other card. Any ideas?
 
Last edited:
I removed vmbr1.155 from pve3 and removed the IP address from pve3 for the Mellanox ConnectX-3 card and only have the ip assigned in truenas now via vmbr1 that has enp1s0 listed. it still doesnt ping. I am completely at a loss.
 
I'd suggest listening with tcpdump on both sides to see if packages are even reaching the other side.
If they reach the other side, but the reply doesn't, it may be an issue with the routing.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!