Degraded ZFS removed

simrsta

New Member
Oct 25, 2022
22
1
3
I want to ask why suddenly one of my nvme disappeared and its status was in zfs removed even though we didn't do that
i shutdown my server and turn on again status back to normal
before status failed and sometime degraded
but this condition always back again
1721720119750.png
1721722614197.png
 
Last edited:
Does journalctl -k -b -1 (or -2, -3, ... depending on how many boots ago it happened) show up anything related to the drive?
 
A) the disk is likely failing and needs to be replaced, and

B) It's a 990 'pro' - these models have known flaws with firmware, and should be replaced with something more reliable + has a high TBW rating.
 
B) It's a 990 'pro' - these models have known flaws with firmware, and should be replaced with something more reliable + has a high TBW rating.

Can you be more specific? I thought the problems you talk about started with 980 and were since fixed with a new firmware. Presumably 990 would not need that anymore. If it's the same issue, you are insinuating there's no Samsungs left to recommend as an NVMe?

I ask because there's virtually no e.g. PLP NVMe's (i.e. not SATA, U.2/U.3 connected) SSDs *from Samsung that I know of to actually recommend.

EDIT: I actually only know of Micron 7400 Pro that would be an M.2 one with PLP.
 
Last edited:
Yeah, the M.2 2280 form factor will (severely) limit PLP SSD choices, I don't know of many options for larger capacity drives either.

U.2 drive on a PCIe riser card can be an option, but that's not necessarily an ideal solution.
 
  • Like
Reactions: esi_y
I have the exact same problem. Even after a reboot the ZFS pool stays degraded:
1721983273875.png

lsblk -l only shows one nvme disk. And I am out of ideas now.

The last thing I did was to try out GPU (intel iGPU) passthrough using /etc/subgid to an unpriviliged jellyfin container.
 
Have you turned passthrough off again?

What does lspci look like?
I removed the settings from /etc/subgid and rebooted the proxmox host but the problem still persists.

lspci output:
Code:
0000:00:00.0 Host bridge: Intel Corporation 12th Gen Core Processor Host Bridge (rev 05)
0000:00:02.0 VGA compatible controller: Intel Corporation Alder Lake-S GT1 [UHD Graphics 770] (rev 0c)
0000:00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller
0000:00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11)
0000:00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11)
0000:00:14.3 Network controller: Intel Corporation Alder Lake-S PCH CNVi WiFi (rev 11)
0000:00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11)
0000:00:16.3 Serial controller: Intel Corporation Device 7aeb (rev 11)
0000:00:17.0 System peripheral: Intel Corporation RST VMD Managed Controller
0000:00:1d.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 (rev 11)
0000:00:1f.0 ISA bridge: Intel Corporation Device 7a83 (rev 11)
0000:00:1f.3 Audio device: Intel Corporation Alder Lake-S HD Audio Controller (rev 11)
0000:00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11)
0000:00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11)
0000:00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (17) I219-LM (rev 11)
0000:01:00.0 Ethernet controller: Aquantia Corp. AQC113C NBase-T/IEEE 802.3an Ethernet Controller [Marvell Scalable mGig] (rev 03)
10000:e0:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11)
10000:e0:1b.0 System peripheral: Intel Corporation RST VMD Managed Controller
10000:e0:1b.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port (rev 11)
10000:e1:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller S4LV008[Pascal]
 
I want to ask why suddenly one of my nvme disappeared and its status was in zfs removed even though we didn't do that
i shutdown my server and turn on again status back to normal
before status failed and sometime degraded
but this condition always back again
View attachment 71706
View attachment 71710
After shutting my proxmox host down turning it back on again the second nvme drive came back online aswell.
May I ask you what hardware are you running your proxmox host on?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!