Hello,
I just made a in place upgrade from PVE 6.4-13 to PVE 7 with latest Mellanox OFED drivers (Debian 10.8). the Mellanox Connectx-6 dcards are used for a ceph nautilus cluster (latest version). The mellanox cards are running in ethernet mode with ROCEv2.
I test a virtual pve cluster to check if different versions of PVE works together. I can confirm it works fine.
So I start migrating the first physical node of three and everything seems to work. So I migrated node two. After a time round about 20 Minutes I get al lot of slow queries and pg are under commited and the virtual maschines stops to work. After few minutes the entire ceph cluster crashes and is unreachable. So I have to restore the backups of the nodes.
Any ideas suggestions?
I just made a in place upgrade from PVE 6.4-13 to PVE 7 with latest Mellanox OFED drivers (Debian 10.8). the Mellanox Connectx-6 dcards are used for a ceph nautilus cluster (latest version). The mellanox cards are running in ethernet mode with ROCEv2.
I test a virtual pve cluster to check if different versions of PVE works together. I can confirm it works fine.
So I start migrating the first physical node of three and everything seems to work. So I migrated node two. After a time round about 20 Minutes I get al lot of slow queries and pg are under commited and the virtual maschines stops to work. After few minutes the entire ceph cluster crashes and is unreachable. So I have to restore the backups of the nodes.
Any ideas suggestions?