2 identical systems - one has duplicate id's found, other not

geronimobb

Well-Known Member
Apr 22, 2017
33
1
48
50
Hy all,

I have 2 identical systems, both have 2x CT500P2SSD8 nvme's in zfs config (mirror) for booting and storage of vm's and lxc's. They all have the same firmware (P2CR012). They both have the same motherboard (x470d4u), processor and memory.

After opt-in to 6.2 kernel on both systems, one of the systems only shows 1 nvme (in bios both are seen), which degrades the rpool pool. I need the 6.2 kernel, since i'm adding an intel arc 380 card. In the logs i find the 'duplicate id's found' message.

I thought the issue with duplicate id's and only 1 nvme showing up, would cause issues on either both or none of the systems (since manufacturer related)
This behavior makes me think it is worth it just try to reinstall proxmox.
Any advise?

Kind regards & thanks in advance.
 
Hi,
Thanks.
I did notice that info too.
But i'm asking myself why would it be only a problem on 1 of 2 identical systems... i wouldn't mind to much reinstalling, if it could make sense/help.

Kind regards.
 
But i'm asking myself why would it be only a problem on 1 of 2 identical systems... i wouldn't mind to much reinstalling, if it could make sense/help.
Do you have the message nvme nvme0: globally duplicate IDs for nsid 1 or similar in your dmesg/system log? I'd first try booting with kernel 5.15 rather than reinstalling. Maybe the ID is only duplicate for one pair of disks but not the other?
 
Yes, i do have that message. And since 2 days i'm running kernel 5.15 again.

What i'm trying to understand, is why can the other identical system (same hardware & software) run the kernel 6.2 without problems (and this error message), while the first one can not 6.12 without problems. The one with problems (globally duplicate IDs) needs to have an intel arc a380 installed.
For this intel card, i would need to add kernel 6.2, which then breaks the rpool storage...

Kind regards
 
What is the output of nvme id-ns /dev/nvmeXnY for all four disks?
 
And indeed, on system 1 both eui64 fields are the same, on system 2 they are not.
What i'm not sure, if these are generated by the system, or they are defined by the nvme's themselve?

nvme.png
 
If you just want the issue to go away: It might work if you can swap one of the disks in system 1 with a disk in system 2 (swapping the data/replacing in zfs too of course).

If you are up to debugging the issue further, you might want to check if it's present in upstream kernels (I'd build my own with and without the commit adding the quirk) and report the issue upstream by mailing the relevant people on the kernel mailing list.
 
Thanks for you assistance!
I liked the first suggestion, but i didn't want to involve a working system.
So i bought 2 new nvme drives. I will keep the old ones as spare.
 
Hello,

sorry for hijacking your thread. Since upgrading one of my hosts to pve8 i have a problem with 6 nvme disks (SAMSUNG MZ1WV480HCGL-000MV).

This problem didn't occur with pve 7 (Kernel 5.15).

dmesg gives me the following output:

Bash:
[    5.575577] nvme nvme0: pci function 0000:04:00.0                                                                                                                                                                                                                                                                                          
[    5.575601] nvme nvme1: pci function 0000:05:00.0                                                                                                                                                                                                                                                                                          
[    5.575702] nvme nvme2: pci function 0000:06:00.0                                                                                                                                                                                                                                                                                          
[    5.575920] nvme nvme3: pci function 0000:07:00.0                                                                                                                                                                                                                                                                                          
[    5.575950] nvme nvme4: pci function 0000:0a:00.0                                                                                                                                                                                                                                                                                          
[    5.575999] nvme nvme5: pci function 0000:0b:00.0                                                                                                                                                                                                                                                                                          
[    5.580316] nvme nvme0: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                     
[    5.581593] nvme nvme1: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                     
[    5.581616] nvme nvme2: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                     
[    5.581945] nvme nvme3: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                     
[    5.581979] nvme nvme4: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                     
[    5.582251] nvme nvme5: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                     
[    5.583348] nvme nvme1: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                  
[    5.583400] nvme nvme1: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                               
[    5.583507] nvme nvme2: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                  
[    5.583560] nvme nvme2: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                               
[    5.583813] nvme nvme4: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                  
[    5.583860] nvme nvme3: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                  
[    5.583867] nvme nvme4: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                               
[    5.583913] nvme nvme3: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                               
[    5.584014] nvme nvme5: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                  
[    5.584067] nvme nvme5: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                               
[    5.584523]  nvme0n1: p1 p9

Only one drive, out of 6 is usable.

nvme list output:

Bash:
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            S1Y0NY0HB01727       SAMSUNG MZ1WV480HCGL-000MV               1         355,34  GB / 480,10  GB    512   B +  0 B   BXU87M9Q

These drives are rather common datacenter drives (Samsung SM-953).

So i need to patch the installed kernel to get the check for duplicate ns-ids disabled? Or i have to wait for the kernel guys to extend the actual quirk-list for my drive? Maybe we could get a pve-kernel which has this check globally disabled - with the label "use at our own risk"?

Or did somebody managed to get drives with "globally duplicate IDs" working with the actual 6.2 kernel?

Thanks in advance.
 
Last edited:
Or i have to wait for the kernel guys to extend the actual quirk-list for my drive?
Yes, that would be the best. I'd write the relevant kernel maintainers a quick mail with the output of nvme id-ns /dev/nvmeXnY, not sure if they are aware of the issue for your specific model yet.
 
Hi fiona,

i have already submitted a Kernel.org bugreport -> https://bugzilla.kernel.org/show_bug.cgi?id=217593

A Samsung employee has addressed the problem, and got a reply from his team internally, that there is no plan on releasing a new firmware for this model and the related models with the same VID/DID pair.
So he will submit a patch and add the quirk to the kernel devs.

In order to get my system running again, maybe i could ask for adding this quirk in an pve-test-kernel?
If thats unreasonable, iam searching for a guide to compile my own pve-kernel with a modified pci.c source.

Kind regards
 
Last edited:
  • Like
Reactions: fiona
Hi fiona,

i have already submitted a Kernel.org bugreport -> https://bugzilla.kernel.org/show_bug.cgi?id=217593

A Samsung employee has addressed the problem, and got a reply from his team internally, that there is no plan on releasing a new firmware for this model and the related models with the same VID/DID pair.
So he will submit a patch and add the quirk to the kernel devs.
Great!

In order to get my system running again, maybe i could ask for adding this quirk in an pve-test-kernel?
Once it's tested and on the way into upstream, we can also incorporate it into our kernels.

If thats unreasonable, iam searching for a guide to compile my own pve-kernel with a modified pci.c source.
Please see (should still be correct enough): https://forum.proxmox.com/threads/building-the-pve-kernel-on-proxmox-ve-6-x.76137/ and please also read Thomas' responses.

You have to place your patch (with a .patch extension) into patches/kernel before build and you should see it getting applied during build, (early in the output), e.g.
Code:
applying patch '../../patches/kernel/0008-however-the-patch-title-is.patch
 
Alright, i have news on this case.

After a brief discussion of the kernel devs, there is a patch which adesses the problem for all non-global-nsid nvme drives.

https://git.kernel.org/pub/scm/linu.../?id=90b4622954d59078fa0cecad7e7baa48efd006e7

It doesn't errors out now like all kernels since 5.18. It throws a warning and proceeds.

Ths is lined up for the linux 6.5 kernel. Is there a chance, that we can include this patch in a pve-test kernel? I had no luck with patching and compiling my own pve-kernel.

kind regards.
 
Alright, i have news on this case.

After a brief discussion of the kernel devs, there is a patch which adesses the problem for all non-global-nsid nvme drives.

https://git.kernel.org/pub/scm/linu.../?id=90b4622954d59078fa0cecad7e7baa48efd006e7

It doesn't errors out now like all kernels since 5.18. It throws a warning and proceeds.

Ths is lined up for the linux 6.5 kernel. Is there a chance, that we can include this patch in a pve-test kernel? I had no luck with patching and compiling my own pve-kernel.

kind regards.
Thank you for the effort! The relevant workaround has been backported in our git now: https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=069e83e4621ed08c2bdb61e9b0c57958cf55491b but it might still take a bit until it is packaged and reaches the repositories.
 
  • Like
Reactions: hodo
Same trouble on Intel SSD DC P4608 Series 6.4tb, it`s expected to work as two seperate 3.2tb disks, but one of disk is not working both on 6.2.16-3-pve and 6.2.16-4-pve with "nvme nvme1: globally duplicate IDs for nsid 1"
 
Hi idk,

we have to wait for 6.2.16-5-pve. Its already lined up, but hasn’t hit the repo so far.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!