Samsung pm9a3 MZQL21T9HCJR-00A07 hotplug not working with Supermicro server

mailinglists

Renowned Member
Mar 14, 2012
641
67
93
Hi guys,

just wanted to let you know, that with PM 7, when you plug in brand new Samsung pm9a3 MZQL21T9HCJR-00A07, one of the current disks dissapears (and so on with each disk you add). If you notice before your zfs raid falls apart, you can get old devices back with rescan of pci: echo 1 > /sys/bus/pci/rescan and zpool can be healthy again.

However, newly hot plugged disks newer show up. They do show up if we reboot the host.

We will try and check for firmware updates on the disks first and see how it goes. I do not think this is a Linux kernel issue, however fix could probably be introduced there as well.
 
Also check firmware and BIOS updates for the server and disk/hba controller itself too :)
 
We have no way of getting firmware updates from Samsung for this Samsung disk, while Dell and others selling these rebranded disks, have working fixes / firmware out. We have latest BIOS and firmware on motherboard and NVMe (LSI 9400) kontroller already installed and there is nothing to update.

Currently there is no solution that we are aware of. Please reply if you have ideas.
 
Last edited:
We have no way of getting firmware updates from Samsung for this Samsung disk, while Dell and others selling these rebranded disks, have working fixes / firmware out.
I did notice that myself recently when I was looking for firmware updates to some PM883 SSDs that I own. That will definitely make me think twice about getting some Samsung enterprise SSDs in the future...

If you got everything else updated in the server, you could try a newer kernel. The default for Proxmox VE 7 is still 5.15 but we do have newer ones available that can be manually installed (opt-in). Currently this would be 6.1, see this thread.
 
This isnt a difficult issue to troubleshoot, just a pita. You have four possible failure points- the drive, the backplane, the HBA, and the machine itself (nvme cable.) does the problem persist in a different slot? If yes, look into your drive firmware, HBA firmware rev, kernel module version and kernel version for conflicts- its NOT just something that may be fixed with just a kernel update, at least not necessarily.

you can get old devices back with rescan of pci: echo 1 > /sys/bus/pci/rescan and zpool can be healthy again.
this doesnt work with nvme, at least not in passthrough mode. how does the drive show to the system (/dev/sdx or /dev/nvmex)?
 
I did notice that myself recently when I was looking for firmware updates to some PM883 SSDs that I own. That will definitely make me think twice about getting some Samsung enterprise SSDs in the future...

If you got everything else updated in the server, you could try a newer kernel. The default for Proxmox VE 7 is still 5.15 but we do have newer ones available that can be manually installed (opt-in). Currently this would be 6.1, see this thread.
We are currently on 6.1.* so that live migration works from newer to older CPUs. Did not help.
 
Do you have a mixed population of disks? Or, are they all the same make/model/vendor?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
All the same disks. Raid controler is: 86:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3416 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) (rev 01). Might be a HW RAID related issue, but it also already is on latest firmware.
 
This isnt a difficult issue to troubleshoot, just a pita. You have four possible failure points- the drive, the backplane, the HBA, and the machine itself (nvme cable.) does the problem persist in a different slot? If yes, look into your drive firmware, HBA firmware rev, kernel module version and kernel version for conflicts- its NOT just something that may be fixed with just a kernel update, at least not necessarily.


this doesnt work with nvme, at least not in passthrough mode. how does the drive show to the system (/dev/sdx or /dev/nvmex)?
Well it worked for us. We got /dev/nvme* back after doing it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!