Supermicron nvme hotswap issue

Feb 15, 2023
20
0
1
Hi,

Reaching out to hear if someone bumped into this issue. In our Supermicron 5 node ceph setup we mainly use nvme model Micron_9300_MTFDHAL7T6TDP. The issue we have is that when we add a new disk the disk in the slot next to it dies briefly. We do not yet know if it is related to the newer disk model Micron_7450_MTFDKCC7T6TFR that we add.

The slot placement is as below and in this case when I punch in something in slot 7 slot 5 dies for a msec.
1 3 5 7 9
0 2 4 6 8

Checking the logs we get lots of I/O errors of the failing disk and the /dev/nvme8n1 changes to /dev/nvme8n2. This is really annoying and causes lots of problems for us since we now have to migrate everything away from a node and shut it down before doing anything.

Some info about the node running PX 7.4-3 on kernel (Linux 5.15.104-1-pve #1 SMP PVE 5.15.104-2 (2023-04-12T11:23Z):

Code:
Firmware Revision: 03.10.30
Firmware Build Time: 11/18/2022
BIOS Version: 2.5
BIOS Build Time: 09/14/2022
Redfish Version: 1.8.0
CPLD Version: A2.C5.08
Manufacturer: South Pole AB
Product Name: AS -1124US-TNRP
Serial No.:

-FRU Information
FRU Device ID: 0
Chassis Info:
Chassis Type: Other
Chassis Part Number: CSE-119UHTS-R1K22HP-A
Chassis Serial Number: 
-Board Info:
Language: English
Board Manufacturer: Supermicro
Board Product Name: H12DSU-iN
Board Serial Num: 
Board Part Num: H12DSU-iN
-Product Info:
Language: English
Manufacturer Name: South Pole AB
Product Name:
Product PartNum: AS -1124US-TNRP
Product Version:
Product SerialNum: 
AssetTag:

Really interesting to hear if we're alone with this issue.

--Mats
 
A good starting point would be to provide cat /var/log/syslog | grep nvme and the output of nvme list -vvv Have you checked if theres a new bios avaiable and a new firmware for the nvme?
 
Hello @matsnlc ,

The 9300s and 7450s are excellent drives. We've not seen any issues mixing them in different systems. One notable difference is that the 7450s are Gen4 PCI while the 9300s are Gen3.

This behavior sounds similar to issues we've debugged with different drives on an older Dell platform. Do you have the `dmesg` output from the time of the insertion?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Here's the output from dmesg at insertion point:

Code:
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 4542425264 op 0x1:(WRITE) flags 0xc800 phys_seg 32 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1678889400 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 5429265672 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1681537808 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1686265312 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1710101600 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 1710101632 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0
[Thu Jul  6 12:02:03 2023] nvme8n1: detected capacity change from 15002931888 to 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 4542425520 op 0x1:(WRITE) flags 0xc800 phys_seg 32 prio class 0
[Thu Jul  6 12:02:03 2023] blk_update_request: I/O error, dev nvme8n1, sector 4542425776 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 0
[Thu Jul  6 12:02:04 2023] nvme nvme9: pci function 0000:22:00.0
[Thu Jul  6 12:02:04 2023] nvme 0000:22:00.0: enabling device (0000 -> 0002)
[Thu Jul  6 12:02:05 2023] nvme nvme10: pci function 0000:24:00.0
[Thu Jul  6 12:02:05 2023] nvme 0000:24:00.0: enabling device (0000 -> 0002)
[Thu Jul  6 12:02:06 2023] nvme nvme9: 96/0/0 default/read/poll queues
[Thu Jul  6 12:02:10 2023] nvme nvme10: Shutdown timeout set to 10 seconds
[Thu Jul  6 12:02:10 2023] nvme nvme10: 96/0/0 default/read/poll queues

What's interesting is that nvme8n1 dies briefly and comes back as nvme8n2, after a reboot everything is normal again and it is back as nvme8n1. It is not like we shift disk very often and according to the tech who did this before he noticed the exact same problem but not on all nodes. All nodes are identical in hardware and software. We had only 9300 to begin with and the issue was noticed first time when we added 1 additional 7450 to all nodes.

We have not yet tried if we see this issue with 9300 as well, testing this is a bit challenging as we need to migrate everything away from the node when testing this. It seem like a drive fails if it is under the same PCI root device.

Also read this:
Linux NVMe hot plug requires the kernel boot argument "pci=pcie_bus_perf" be set in order to get proper MPS (MaxPayloadSize) and MRR (MaxReadRequest). Fatal errors will occur without this argument.

--Mats
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!