NVMe Disappearing

ikecomp

Member
Apr 9, 2020
12
1
8
54
Hi Guys -

I'm running into a bit of a weird issue where every week or sometimes every 2 weeks my nvme drive disappears from proxmox and only a reboot fixes it until it comes back again. I have the nvme drive mounted as a regular directory on an ext4 filesystem. The only thing running on the drive is my windows 10 disk image (qcow2 format). Nothing else uses the drive. I had a smaller cheaper sabrent drive in my machine previously that didn't have this issue some I'm wondering if it's a compatibility issue with proxmox or if the drive might just be bad

SYSLOG
Code:
Sep 27 02:54:01 proxmox systemd[1]: Started Proxmox VE replication runner.
Sep 27 02:54:04 proxmox kernel: [2908335.387198] nvme nvme0: Removing after probe failure status: -19
Sep 27 02:54:04 proxmox kernel: [2908335.407127] blk_update_request: I/O error, dev nvme0n1, sector 262865288 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Sep 27 02:54:04 proxmox kernel: [2908335.407142] blk_update_request: I/O error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Sep 27 02:54:04 proxmox kernel: [2908335.407145] Aborting journal on device nvme0n1p1-8.
Sep 27 02:54:04 proxmox kernel: [2908335.407154] JBD2: Error -5 detected when updating journal superblock for nvme0n1p1-8.
Sep 27 02:54:04 proxmox systemd[1]: Stopped target Local File Systems.
Sep 27 02:54:04 proxmox systemd[1]: Unmounting /mnt/NVMEDISK...
Sep 27 02:54:04 proxmox umount[24549]: umount: /mnt/NVMEDISK: target is busy.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Mount process exited, code=exited, status=32/n/a
Sep 27 02:54:04 proxmox systemd[1]: Failed unmounting /mnt/NVMEDISK.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-d31acd69\x2df784\x2d4ba9\x2dbfef\x2d40036196c03e.device. Stopping, too.
Sep 27 02:54:04 proxmox systemd[1]: Unmounting /mnt/NVMEDISK...
Sep 27 02:54:04 proxmox umount[24550]: umount: /mnt/NVMEDISK: target is busy.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Mount process exited, code=exited, status=32/n/a
Sep 27 02:54:04 proxmox systemd[1]: Failed unmounting /mnt/NVMEDISK.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-d31acd69\x2df784\x2d4ba9\x2dbfef\x2d40036196c03e.device. Stopping, too.
Sep 27 02:54:04 proxmox systemd[1]: Unmounting /mnt/NVMEDISK...
Sep 27 02:54:04 proxmox umount[24551]: umount: /mnt/NVMEDISK: target is busy.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Mount process exited, code=exited, status=32/n/a
Sep 27 02:54:04 proxmox systemd[1]: Failed unmounting /mnt/NVMEDISK.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-d31acd69\x2df784\x2d4ba9\x2dbfef\x2d40036196c03e.device. Stopping, too.
Sep 27 02:54:04 proxmox systemd[1]: Unmounting /mnt/NVMEDISK...
Sep 27 02:54:04 proxmox umount[24552]: umount: /mnt/NVMEDISK: target is busy.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Mount process exited, code=exited, status=32/n/a
Sep 27 02:54:04 proxmox systemd[1]: Failed unmounting /mnt/NVMEDISK.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-d31acd69\x2df784\x2d4ba9\x2dbfef\x2d40036196c03e.device. Stopping, too.
Sep 27 02:54:04 proxmox systemd[1]: Unmounting /mnt/NVMEDISK...
Sep 27 02:54:04 proxmox umount[24553]: umount: /mnt/NVMEDISK: target is busy.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Mount process exited, code=exited, status=32/n/a
Sep 27 02:54:04 proxmox systemd[1]: Failed unmounting /mnt/NVMEDISK.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-d31acd69\x2df784\x2d4ba9\x2dbfef\x2d40036196c03e.device. Stopping, too.
Sep 27 02:54:04 proxmox systemd[1]: Unmounting /mnt/NVMEDISK...
Sep 27 02:54:04 proxmox umount[24554]: umount: /mnt/NVMEDISK: target is busy.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Mount process exited, code=exited, status=32/n/a
Sep 27 02:54:04 proxmox systemd[1]: Failed unmounting /mnt/NVMEDISK.
Sep 27 02:54:04 proxmox systemd[1]: mnt-NVMEDISK.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-d31acd69\x2df784\x2d4ba9\x2dbfef\x2d40036196c03e.device. Stopping, too.
Sep 27 02:54:04 proxmox systemd[1]: Unmounting /mnt/NVMEDISK...
Sep 27 02:54:04 proxmox umount[24555]: umount: /mnt/NVMEDISK: target is busy.

DMESG
Code:
root@proxmox:/var/log# dmesg
[3031575.451463] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031575.451595] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031575.451695] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031584.723394] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031584.723533] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031584.723635] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031595.220270] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031595.220400] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031595.220500] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031605.439413] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031605.439548] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031605.439650] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031614.767457] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031614.767589] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031614.767689] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031625.222601] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031625.222733] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031625.222834] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031635.269611] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031635.269746] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031635.269848] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031644.784342] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
[3031644.784472] EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1532: inode #2: comm pvestatd: reading directory lblock 0
 
Additional information

NVMe Smart Info
Code:
root@proxmox:~# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.44-2-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WDS500G3X0C-00SJG0
Serial Number:                      XXXXXXXXX
Firmware Version:                   111110WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 500,107,862,016 [500 GB]
Unallocated NVM Capacity:           0
Controller ID:                      8215
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b460a9bc2
Local Time is:                      Mon Sep 28 15:09:55 2020 EDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     88 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0       0
 2 +     3.00W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3     4000   10000
 4 -   0.0025W       -        -    4  4  4  4     4000   40000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        38 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    8,674,316 [4.44 TB]
Data Units Written:                 3,636,363 [1.86 TB]
Host Read Commands:                 74,444,755
Host Write Commands:                75,063,273
Controller Busy Time:               179
Power Cycles:                       23
Power On Hours:                     3,172
Unsafe Shutdowns:                   12
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

Below is the mounting options being used and Host Build info

--fstab--
UUID=d31acd69-f784-4ba9-bfef-40036196c03e /mnt/NVMEDISK ext4 defaults 0 0

--Build--
CPU(s) 16 x AMD Ryzen 7 3700X 8-Core Processor (1 Socket)
Kernel Version Linux 5.4.44-2-pve #1 SMP PVE 5.4.44-2 (Wed, 01 Jul 2020 16:37:57 +0200)
PVE Manager Version pve-manager/6.2-10/a20769ed
NVME Drive: WD Black 500 GB

Hopefully it's just something I'm doing wrong.
 
Sep 27 02:54:04 proxmox kernel: [2908335.407127] blk_update_request: I/O error, dev nvme0n1, sector 262865288 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 Sep 27 02:54:04 proxmox kernel: [2908335.407142] blk_update_request: I/O error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
seems the disk has some problems

Error Information Log Entries: 1

i'd try to run a smart selftest and see if it unearths something

also check the output of 'dmesg' if there is anything related to your nvme
 
I also had some problems with nvme - an A-DATA XPG.
It was causing a complete freeze at random times w/o anything in the log, reboot fixed it temporarily (it also disappeared from bios a few times).

Switched to Samsung and all problems went away.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!