PVE 8 Samsung NVMe issues

dusatvoj

Member
Jan 22, 2022
4
0
6
25
Hello,
I did an upgrade to PVE 8 and after that I started receiving errors on all NVMe drives I have in cluster (3 node cluster).

There are informations of my NVMes (6 same disks - 2 per node)
Code:
# smartctl -a /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.16-4-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQLB1T9HAJR-00007
Serial Number:                      S439NC0R703862
Firmware Version:                   EDA5402Q

There is just single status: `0x4004` but diferrent commands ...
Code:
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        966     0  0x000f  0x4004      -            0     1     -
  1        965     0  0xc019  0x4004      -            0     1     -
  2        964     0  0xb008  0x4004      -            0     1     -
  3        963     0  0x000e  0x4004      -            0     1     -
  4        962     0  0x400f  0x4004      -            0     1     -
  5        961     0  0xa012  0x4004      -            0     1     -
  6        960     0  0x700f  0x4004      -            0     1     -
  7        959     0  0x600f  0x4004      -            0     1     -
  8        958     0  0x101c  0x4004      -            0     1     -
  9        957     0  0x2011  0x4004      -            0     1     -
 10        956     0  0xb01b  0x4004      -            0     1     -
 11        955     0  0x100b  0x4004      -            0     1     -
 12        954     0  0x500f  0x4004      -            0     1     -
 13        953     0  0xd00c  0x4004      -            0     1     -
 14        952     0  0xd017  0x4004      -            0     1     -
 15        951     0  0x100e  0x4004      -            0     1     -
... (48 entries not read)

Can you help me with debugging what's wrong?
Thank you.
 

Attachments

  • 2023-07-25_17-04-57.png
    2023-07-25_17-04-57.png
    55.9 KB · Views: 11
Can you post an output of zpool status?

But the problem is actually, you will probably get sadly no help, because everyone i know tryes to avoid Samsung enterprise drives.
Not because they are bad, the are probably really great, it's just that they don't provide any firmware updates.

However in the worst case you're on your own, just wanted to prepare you for this.
 
I'm not using ZFS. They are connected via ceph (OS is @ diferrent drives)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!