the future of Infiniband.

RobFantini

Famous Member
May 24, 2012
2,041
107
133
Boston,Mass
based on kernel driver development. are infiniband card modules for most IB cards gong to be around in every kernel for the next 20 years?

[ we have certain IB model cards with kernel/module dmesg errors . most work well. these cards / module cause ceph nodes off line every 4 days or so. that is not stable.]

I'm setting up a new cluster, and expect 20 year shelf life. 10G IP hardware is expensive, but not at all if infinband is not going to be around . Changing cluster network hardware is not something to delegate to the future.
 
20years for anything IT related is nonsense

I'm almost sure that in 20 years ceph would be replaced by at least 2 or 3 or 4 better alternatives

The whole Internet (available for the masses) has about 20-25 years of life. Are you saying that your cluster should survive for 20 years with the current technology? Is this a joke? Are you serious?

20 years ago did you ever think about ceph, cloud or petabytes storage where the bigger disk available was about 20mb (megabytes), SATA inexistent and ultradma/atapi still to standardize?

We have NOW 100gbit Ethernet, in 20 years we have for sure about 1tbit as standard and are you hoping for current infiniband to survive in the Linux kernel?
 
with latest pve kernel, these cards are unreliable:
Code:
01:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0)

there are lines like this in dmesg:
Code:
[28969.931733] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[29451.703875] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[29618.912668] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[36524.237740] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[40343.416078] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[43178.203983] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[44285.016603] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL
[50184.349661] ib0: can't use GFP_NOIO for QPs on device mthca0, using GFP_KERNEL

and every 3-4 days - more often during backups - the node's IB IP address becomes unpingable.

I'll try swapping out the cards with connectX ones. we bought a few extra last week.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!