Supermicron nvme hotswap issue

matsnlc · Jul 6, 2023

Hi,

Reaching out to hear if someone bumped into this issue. In our Supermicron 5 node ceph setup we mainly use nvme model Micron_9300_MTFDHAL7T6TDP. The issue we have is that when we add a new disk the disk in the slot next to it dies briefly. We do not yet know if it is related to the newer disk model Micron_7450_MTFDKCC7T6TFR that we add.

The slot placement is as below and in this case when I punch in something in slot 7 slot 5 dies for a msec.
1 3 5 7 9
0 2 4 6 8

Checking the logs we get lots of I/O errors of the failing disk and the /dev/nvme8n1 changes to /dev/nvme8n2. This is really annoying and causes lots of problems for us since we now have to migrate everything away from a node and shut it down before doing anything.

Some info about the node running PX 7.4-3 on kernel (Linux 5.15.104-1-pve #1 SMP PVE 5.15.104-2 (2023-04-12T11:23Z):

Code:

Firmware Revision: 03.10.30
Firmware Build Time: 11/18/2022
BIOS Version: 2.5
BIOS Build Time: 09/14/2022
Redfish Version: 1.8.0
CPLD Version: A2.C5.08
Manufacturer: South Pole AB
Product Name: AS -1124US-TNRP
Serial No.:

-FRU Information
FRU Device ID: 0
Chassis Info:
Chassis Type: Other
Chassis Part Number: CSE-119UHTS-R1K22HP-A
Chassis Serial Number: 
-Board Info:
Language: English
Board Manufacturer: Supermicro
Board Product Name: H12DSU-iN
Board Serial Num: 
Board Part Num: H12DSU-iN
-Product Info:
Language: English
Manufacturer Name: South Pole AB
Product Name:
Product PartNum: AS -1124US-TNRP
Product Version:
Product SerialNum: 
AssetTag:

Really interesting to hear if we're alone with this issue.

--Mats

jsterr · Jul 7, 2023

A good starting point would be to provide cat /var/log/syslog | grep nvme and the output of nvme list -vvv Have you checked if theres a new bios avaiable and a new firmware for the nvme?

bbgeek17 · Jul 7, 2023

Hello @matsnlc ,

The 9300s and 7450s are excellent drives. We've not seen any issues mixing them in different systems. One notable difference is that the 7450s are Gen4 PCI while the 9300s are Gen3.

This behavior sounds similar to issues we've debugged with different drives on an older Dell platform. Do you have the `dmesg` output from the time of the insertion?

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

matsnlc · Jul 10, 2023

Here's the output from dmesg at insertion point:

Code:

[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 4542425264 op 0x1:(WRITE) flags 0xc800 phys_seg 32 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1678889400 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 5429265672 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1681537808 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1686265312 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1710101600 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1710101632 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0
[Thu Jul  6 12:02:03 2023] nvme8n1: detected capacity change from 15002931888 to 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 4542425520 op 0x1:(WRITE) flags 0xc800 phys_seg 32 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 4542425776 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 0
[Thu Jul  6 12:02:04 2023] nvme nvme9: pci function 0000:22:00.0
[Thu Jul  6 12:02:04 2023] nvme 0000:22:00.0: enabling device (0000 -> 0002)
[Thu Jul  6 12:02:05 2023] nvme nvme10: pci function 0000:24:00.0
[Thu Jul  6 12:02:05 2023] nvme 0000:24:00.0: enabling device (0000 -> 0002)
[Thu Jul  6 12:02:06 2023] nvme nvme9: 96/0/0 default/read/poll queues
[Thu Jul  6 12:02:10 2023] nvme nvme10: Shutdown timeout set to 10 seconds
[Thu Jul  6 12:02:10 2023] nvme nvme10: 96/0/0 default/read/poll queues

What's interesting is that nvme8n1 dies briefly and comes back as nvme8n2, after a reboot everything is normal again and it is back as nvme8n1. It is not like we shift disk very often and according to the tech who did this before he noticed the exact same problem but not on all nodes. All nodes are identical in hardware and software. We had only 9300 to begin with and the issue was noticed first time when we added 1 additional 7450 to all nodes.

We have not yet tried if we see this issue with 9300 as well, testing this is a bit challenging as we need to migrate everything away from the node when testing this. It seem like a drive fails if it is under the same PCI root device.

Also read this:
Linux NVMe hot plug requires the kernel boot argument "pci=pcie_bus_perf" be set in order to get proper MPS (MaxPayloadSize) and MRR (MaxReadRequest). Fatal errors will occur without this argument.

--Mats

Search

Search

Supermicron nvme hotswap issue

matsnlc

New Member

jsterr

Renowned Member

bbgeek17

Distinguished Member

matsnlc

New Member