Update PVE 6 to 7 with Installes Mellanox Connectx-6 Drivers DKMS Ceph not working

dan.ger

Well-Known Member
May 13, 2019
83
7
48
Hello,

I just made a in place upgrade from PVE 6.4-13 to PVE 7 with latest Mellanox OFED drivers (Debian 10.8). the Mellanox Connectx-6 dcards are used for a ceph nautilus cluster (latest version). The mellanox cards are running in ethernet mode with ROCEv2.

I test a virtual pve cluster to check if different versions of PVE works together. I can confirm it works fine.

So I start migrating the first physical node of three and everything seems to work. So I migrated node two. After a time round about 20 Minutes I get al lot of slow queries and pg are under commited and the virtual maschines stops to work. After few minutes the entire ceph cluster crashes and is unreachable. So I have to restore the backups of the nodes.

Any ideas suggestions?
 
Solved:

Using Ubuntu 21.04 Mellanox OFED Drivers solved the problem. Use apt list:
Code:
#
# Mellanox Technologies Ltd. public repository configuration file.
# For more information, refer to http://linux.mellanox.com
#

# [mlnx_ofed_latest_base]
#deb http://linux.mellanox.com/public/repo/mlnx_ofed/latest/debian10.8/$(ARCH) ./

# Ubuntu 21.04 = bullseye/sid
# [mlnx_ofed_5.4-1.0.3.0_base]
deb http://linux.mellanox.com/public/repo/mlnx_ofed/5.4-1.0.3.0/ubuntu21.04/$(ARCH) ./

and then install nvme-cli if you have nvme drives to get rid of:
Code:
pve-01 : Nov  3 00:04:02 : ceph : a password is required ; PWD=/ ; USER=root ; COMMAND=nvme dell smart-log-add --json /dev/nvmedrive
 
Apologies to resurrect an old thread...

What was the exact process to install the Ubuntu 21 OFED drivers on Proxmox 7?

Many thanks
 
Hello,

Just take a look at this thread, I documented there. As I remember replace the apt repo with Mellanox Ubuntu’s one.
 
Last edited: