[SOLVED] NVME Passthrough - Write Protected

critterfluffy

New Member
Nov 15, 2023
1
0
1
HW Info:
Manager: pve-manager/7.4-17/513c62be
Kernel: Linux 5.15.126-1-pve #1 SMP PVE 5.15.126-1 (2023-10-03T17:24Z)
CPU: AMD Ryzen 9 5900X 12-Core Processor
Mobo: MSI Meg X570 Unify

My goal is to have a Virtual Gaming machine running in Proxmox with a dedicated GPU and NVME drive to ensure performance. I understand the limitations and view this as a learning project mostly.

I currently have my GPU in my VM with passthrough. It seems to be working just fine and even handles a full restart of the node which previously was an issue.

I have previously had a machine working with both GPU and NVME passthrough but about a year or so ago it broke so I am starting the build over now that I have some time.

Current issue:
So I can see the drive in my Windows 11 install but it appears to be Write-Protected. I can copy files but I can't create/edit/delete anything. I have tested this with live CDs, attempted to image an OS onto the disk (both windows and linux), etc. There is already a Windows 10 install on the drive from before that should be intact but it can't boot up (probably inability to write to page or something similar). I haven't attempted to image the drive outside of the VM environment so I acknowledge that the issue could actually be with the NVME itself (I really hope not).

If anyone has any tests or configuration I haven't tried in the last week that would be appreciated. Also, if anything is missing from above or if any other info could be helpful please let me know. I tried to get what I thought was relevant but I likely missed something I am less familiar with.

Thanks in advance.


SOLVED:
Unfortunately, it does look like a HW failure on the drive. I will need to put in for a warranty replacement.
I found the issue by running the command:
smartctl -a /dev/nvme0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- available spare has fallen below threshold
- media has been placed in read only mode


Important files:
Code:
cat /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel

cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0

pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb

cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e84,10de:10f8,10de:1ad8,10de:1ad9 disable_vga=1

/etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pci=noats initcall_blacklist=sysfb_init"
GRUB_CMDLINE_LINUX=""

The PCI.IDs in the vfio.conf are the supported features of my GPU.

Additional commands:
Code:
dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[    0.147022] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160
[    0.147023] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160
[    0.147024] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160
[    0.147025] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160
[    0.708148] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.709312] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.709313] AMD-Vi: Extended features (0x58f77ef22294a5a): PPR NX GT IA PC GA_vAPIC
[    0.709315] AMD-Vi: Interrupt remapping enabled
[    0.721412] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:21:06.0
/sys/kernel/iommu_groups/7/devices/0000:00:07.0
/sys/kernel/iommu_groups/25/devices/0000:2d:00.2
/sys/kernel/iommu_groups/25/devices/0000:2d:00.0
/sys/kernel/iommu_groups/25/devices/0000:2d:00.3
/sys/kernel/iommu_groups/25/devices/0000:2d:00.1
/sys/kernel/iommu_groups/15/devices/0000:21:01.0
/sys/kernel/iommu_groups/5/devices/0000:00:04.0
/sys/kernel/iommu_groups/23/devices/0000:27:00.0
/sys/kernel/iommu_groups/13/devices/0000:20:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/21/devices/0000:22:00.0
/sys/kernel/iommu_groups/11/devices/0000:00:14.3
/sys/kernel/iommu_groups/11/devices/0000:00:14.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.2
/sys/kernel/iommu_groups/28/devices/0000:2f:00.3
/sys/kernel/iommu_groups/18/devices/0000:2a:00.3
/sys/kernel/iommu_groups/18/devices/0000:2a:00.1
/sys/kernel/iommu_groups/18/devices/0000:21:08.0
/sys/kernel/iommu_groups/18/devices/0000:2a:00.0
/sys/kernel/iommu_groups/8/devices/0000:00:07.1
/sys/kernel/iommu_groups/26/devices/0000:2e:00.0
/sys/kernel/iommu_groups/16/devices/0000:21:05.0
/sys/kernel/iommu_groups/6/devices/0000:00:05.0
/sys/kernel/iommu_groups/24/devices/0000:28:00.0
/sys/kernel/iommu_groups/14/devices/0000:21:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:03.1
/sys/kernel/iommu_groups/22/devices/0000:23:00.0
/sys/kernel/iommu_groups/12/devices/0000:00:18.3
/sys/kernel/iommu_groups/12/devices/0000:00:18.1
/sys/kernel/iommu_groups/12/devices/0000:00:18.6
/sys/kernel/iommu_groups/12/devices/0000:00:18.4
/sys/kernel/iommu_groups/12/devices/0000:00:18.2
/sys/kernel/iommu_groups/12/devices/0000:00:18.0
/sys/kernel/iommu_groups/12/devices/0000:00:18.7
/sys/kernel/iommu_groups/12/devices/0000:00:18.5
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/20/devices/0000:21:0a.0
/sys/kernel/iommu_groups/20/devices/0000:2c:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:08.1
/sys/kernel/iommu_groups/29/devices/0000:2f:00.4
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/19/devices/0000:21:09.0
/sys/kernel/iommu_groups/19/devices/0000:2b:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:08.0
/sys/kernel/iommu_groups/27/devices/0000:2f:00.0

lspci -nnk
23:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
        Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a801]
        Kernel driver in use: vfio-pci
        Kernel modules: nvme

I have previously added a few additional options to the above configurations files but have since rolled back. These configs would have been:
Code:
cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e84,10de:10f8,10de:1ad8,10de:1ad9,144d:a801,144d:a808 disable_vga=1

cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0
options vfio_iommu_type1 allow_unsafe_interrupts=1

Current VM config (Recently added the CPU flags so they don't help or hurt):
Code:
cat /etc/pve/nodes/homelan/qemu-server/105.conf
acpi: 1
agent: 1
balloon: 0
bios: ovmf
boot: order=ide2;sata0
cores: 6
cpu: host,flags=+ibpb;+virt-ssbd;+amd-ssbd;+amd-no-ssb;+pdpe1gb;+aes
efidisk0: local-lvm:vm-105-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:2d:00.0,pcie=1,x-vga=1
hostpci1: 0000:2d:00.1,pcie=1
hostpci2: 0000:2d:00.2,pcie=1
hostpci3: 0000:2d:00.3,pcie=1
hostpci4: 0000:23:00,pcie=1
hotplug: disk,network,usb,memory,cpu
ide2: none,media=cdrom
kvm: 1
machine: pc-q35-7.2
memory: 16384
meta: creation-qemu=7.2.0,ctime=1699598302
name: GamePC
net0: e1000=AA:D3:43:F6:ED:1D,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: win11
sata0: LANCache:vm-105-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=d6773b8c-5995-49a1-8c96-5423bf9a15ca
sockets: 1
startup: up=30
tags: gaming
tpmstate0: local-lvm:vm-105-disk-1,size=4M,version=v2.0
vga: none
vmgenid: f360b48a-1fdd-4333-893d-4274228ab6a9
vmstatestorage: LANCache

All above files are while the NVME is attached to VM 105 which is on.

EDIT:
Further testing. Mounted the drive in Proxmox host using:
mount /dev/nvme0n1p3 /mnt/temp/

and this also comes out as read-only. So it may be at a lower level then just the passthrough. Hmm.

Checked the temp and the NVME is sitting at a comfortable 50C+-10 so that isn't the issue.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!