LVM-Thin with grey question mark

synthetichug

Member
Apr 22, 2022
6
0
6
1737486083629.png1737486651116.png

LVM-thin drives showing a grey question mark. Had this happen with a single drive before, thought the drive may have failed. It was a recent purchase so I returned it and also got a sata card, just in case there was an issue with the sata ports on the motherboard. All the drives minus the NVME were created through the Web GUI under LVM-thin, and each as a single device they were not pooled together. The drives were working and the smart values were good with 0% to 1% wearout with 2 of them being brand new. These are not RAID. Just trying to figure out how to fix and troubleshoot this. Below are some commands Ive seen used for similar issues, and I've upload a chunk of the system log that looks roughly around when the issue started.

Code:
root@Echoes:~# lsblk
NAME                            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                               8:0    0   1.8T  0 disk 
├─Tablet-Tablet_tmeta           252:8    0  15.9G  0 lvm  
│ └─Tablet-Tablet-tpool         252:10   0   1.8T  0 lvm  
│   ├─Tablet-Tablet             252:23   0   1.8T  1 lvm  
│   └─Tablet-vm--101--disk--0   252:24   0   1.9T  0 lvm  
└─Tablet-Tablet_tdata           252:9    0   1.8T  0 lvm  
  └─Tablet-Tablet-tpool         252:10   0   1.8T  0 lvm  
    ├─Tablet-Tablet             252:23   0   1.8T  1 lvm  
    └─Tablet-vm--101--disk--0   252:24   0   1.9T  0 lvm  
sdb                               8:16   0   1.8T  0 disk 
├─Slate-Slate_tmeta             252:11   0  15.9G  0 lvm  
│ └─Slate-Slate-tpool           252:13   0   1.8T  0 lvm  
│   ├─Slate-Slate               252:25   0   1.8T  1 lvm  
│   └─Slate-vm--101--disk--0    252:26   0   1.9T  0 lvm  
└─Slate-Slate_tdata             252:12   0   1.8T  0 lvm  
  └─Slate-Slate-tpool           252:13   0   1.8T  0 lvm  
    ├─Slate-Slate               252:25   0   1.8T  1 lvm  
    └─Slate-vm--101--disk--0    252:26   0   1.9T  0 lvm  
sdc                               8:32   0   1.8T  0 disk 
├─Shard-Shard_tmeta             252:14   0  15.9G  0 lvm  
│ └─Shard-Shard-tpool           252:16   0   1.8T  0 lvm  
│   ├─Shard-Shard               252:27   0   1.8T  1 lvm  
│   └─Shard-vm--101--disk--0    252:28   0   1.9T  0 lvm  
└─Shard-Shard_tdata             252:15   0   1.8T  0 lvm  
  └─Shard-Shard-tpool           252:16   0   1.8T  0 lvm  
    ├─Shard-Shard               252:27   0   1.8T  1 lvm  
    └─Shard-vm--101--disk--0    252:28   0   1.9T  0 lvm  
sdd                               8:48   0   1.8T  0 disk 
├─Fragment-Fragment_tmeta       252:17   0  15.9G  0 lvm  
│ └─Fragment-Fragment-tpool     252:19   0   1.8T  0 lvm  
│   ├─Fragment-Fragment         252:29   0   1.8T  1 lvm  
│   └─Fragment-vm--101--disk--0 252:30   0   1.9T  0 lvm  
└─Fragment-Fragment_tdata       252:18   0   1.8T  0 lvm  
  └─Fragment-Fragment-tpool     252:19   0   1.8T  0 lvm  
    ├─Fragment-Fragment         252:29   0   1.8T  1 lvm  
    └─Fragment-vm--101--disk--0 252:30   0   1.9T  0 lvm  
sde                               8:64   0   1.8T  0 disk 
├─Glyph-Glyph_tmeta             252:20   0  15.9G  0 lvm  
│ └─Glyph-Glyph-tpool           252:22   0   1.8T  0 lvm  
│   ├─Glyph-Glyph               252:31   0   1.8T  1 lvm  
│   └─Glyph-vm--101--disk--0    252:32   0   1.9T  0 lvm  
└─Glyph-Glyph_tdata             252:21   0   1.8T  0 lvm  
  └─Glyph-Glyph-tpool           252:22   0   1.8T  0 lvm  
    ├─Glyph-Glyph               252:31   0   1.8T  1 lvm  
    └─Glyph-vm--101--disk--0    252:32   0   1.9T  0 lvm  
nvme0n1                         259:0    0 953.9G  0 disk 
├─nvme0n1p1                     259:1    0  1007K  0 part 
├─nvme0n1p2                     259:2    0     1G  0 part 
└─nvme0n1p3                     259:3    0 952.9G  0 part 
  ├─pve-swap                    252:0    0     8G  0 lvm  [SWAP]
  ├─pve-root                    252:1    0    96G  0 lvm  /
  ├─pve-data_tmeta              252:2    0   8.3G  0 lvm  
  │ └─pve-data-tpool            252:4    0 816.2G  0 lvm  
  │   ├─pve-data                252:5    0 816.2G  1 lvm  
  │   ├─pve-vm--101--disk--0    252:6    0     4M  0 lvm  
  │   └─pve-vm--101--disk--1    252:7    0   500G  0 lvm  
  └─pve-data_tdata              252:3    0 816.2G  0 lvm  
    └─pve-data-tpool            252:4    0 816.2G  0 lvm  
      ├─pve-data                252:5    0 816.2G  1 lvm  
      ├─pve-vm--101--disk--0    252:6    0     4M  0 lvm  
      └─pve-vm--101--disk--1    252:7    0   500G  0 lvm

Code:
root@Echoes:~# pvesm status
  Command failed with status code 5.
command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
no such logical volume Shard/Shard
no such logical volume Slate/Slate
no such logical volume Fragment/Fragment
no such logical volume Glyph/Glyph
Name             Type     Status           Total            Used       Available        %
Fragment      lvmthin   inactive               0               0               0    0.00%
Glyph         lvmthin   inactive               0               0               0    0.00%
Shard         lvmthin   inactive               0               0               0    0.00%
Slate         lvmthin   inactive               0               0               0    0.00%
Tablet        lvmthin     active      1919827968          959913      1918868054    0.05%
local             dir     active        98497780         6711384        86736848    6.81%
local-lvm     lvmthin     active       855855104        67441382       788413721    7.88%

Code:
root@Echoes:~# pvscan
  PV /dev/nvme0n1p3   VG pve             lvm2 [<952.87 GiB / 16.00 GiB free]
  PV /dev/sda         VG Tablet          lvm2 [<1.82 TiB / 376.00 MiB free]
  Total: 2 [<2.75 TiB] / in use: 2 [<2.75 TiB] / in no VG: 0 [0   ]

Code:
root@Echoes:~# cat /etc/pve/storage.cfg 
dir: local
        path /var/lib/vz
        content vztmpl,iso,backup

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

lvmthin: Tablet
        thinpool Tablet
        vgname Tablet
        content images,rootdir
        nodes Echoes

lvmthin: Slate
        thinpool Slate
        vgname Slate
        content rootdir,images
        nodes Echoes

lvmthin: Shard
        thinpool Shard
        vgname Shard
        content images,rootdir
        nodes Echoes

lvmthin: Fragment
        thinpool Fragment
        vgname Fragment
        content images,rootdir
        nodes Echoes

lvmthin: Glyph
        thinpool Glyph
        vgname Glyph
        content images,rootdir
        nodes Echoes


Code:
root@Echoes:~# vgscan
  Found volume group "pve" using metadata type lvm2
  Found volume group "Tablet" using metadata type lvm2

Code:
root@Echoes:~# lvscan
  ACTIVE            '/dev/pve/data' [<816.21 GiB] inherit
  ACTIVE            '/dev/pve/swap' [8.00 GiB] inherit
  ACTIVE            '/dev/pve/root' [96.00 GiB] inherit
  ACTIVE            '/dev/pve/vm-101-disk-0' [4.00 MiB] inherit
  ACTIVE            '/dev/pve/vm-101-disk-1' [500.00 GiB] inherit
  ACTIVE            '/dev/Tablet/Tablet' [<1.79 TiB] inherit
  ACTIVE            '/dev/Tablet/vm-101-disk-0' [<1.86 TiB] inherit

Code:
root@Echoes:~# pvs
  PV             VG     Fmt  Attr PSize    PFree  
  /dev/nvme0n1p3 pve    lvm2 a--  <952.87g  16.00g
  /dev/sda       Tablet lvm2 a--    <1.82t 376.00m

Code:
root@Echoes:~# vgs
  VG     #PV #LV #SN Attr   VSize    VFree  
  Tablet   1   2   0 wz--n-   <1.82t 376.00m
  pve      1   5   0 wz--n- <952.87g  16.00g

Code:
root@Echoes:~# lvs
  LV            VG     Attr       LSize    Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  Tablet        Tablet twi-aotz--   <1.79t               0.05   0.15                            
  vm-101-disk-0 Tablet Vwi-aotz--   <1.86t Tablet        0.05                                   
  data          pve    twi-aotz-- <816.21g               7.88   0.49                            
  root          pve    -wi-ao----   96.00g                                                      
  swap          pve    -wi-ao----    8.00g                                                      
  vm-101-disk-0 pve    Vwi-aotz--    4.00m data          14.06                                  
  vm-101-disk-1 pve    Vwi-aotz--  500.00g data          12.86
 

Attachments

Waiting on the PSU, but after reboot proxmox drops me into "WELCOME TO GRUB!" and I cant do anything.
if I leave it long enough it will give me "error: failure reading sector" but it is never the same disk or sector.
Last time it did this I had to boot from an installation usb and choose debug mode, and run some kind of repair function or something.
 
1737721878745.png

After a recent reboot I am able to return to the web gui and it looks like I can see the drives still connected but are no longer read as the LVM-thins that I provisioned them as earlier.

Code:
root@Echoes:~# lsblk
NAME                          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                             8:0    0   1.8T  0 disk 
├─Tablet-Tablet_tmeta         252:0    0  15.9G  0 lvm  
│ └─Tablet-Tablet-tpool       252:9    0   1.8T  0 lvm  
│   ├─Tablet-Tablet           252:11   0   1.8T  1 lvm  
│   └─Tablet-vm--101--disk--0 252:12   0   1.9T  0 lvm  
└─Tablet-Tablet_tdata         252:1    0   1.8T  0 lvm  
  └─Tablet-Tablet-tpool       252:9    0   1.8T  0 lvm  
    ├─Tablet-Tablet           252:11   0   1.8T  1 lvm  
    └─Tablet-vm--101--disk--0 252:12   0   1.9T  0 lvm  
sdb                             8:16   0   1.8T  0 disk 
sdc                             8:32   0   1.8T  0 disk 
sdd                             8:48   0   1.8T  0 disk 
sde                             8:64   0   1.8T  0 disk 
sdf                             8:80   0   1.8T  0 disk 
sdg                             8:96   0   1.8T  0 disk 
nvme0n1                       259:0    0 953.9G  0 disk 
├─nvme0n1p1                   259:1    0  1007K  0 part 
├─nvme0n1p2                   259:2    0     1G  0 part 
└─nvme0n1p3                   259:3    0 952.9G  0 part 
  ├─pve-swap                  252:2    0     8G  0 lvm  [SWAP]
  ├─pve-root                  252:3    0    96G  0 lvm  /
  ├─pve-data_tmeta            252:4    0   8.3G  0 lvm  
  │ └─pve-data-tpool          252:6    0 816.2G  0 lvm  
  │   ├─pve-data              252:7    0 816.2G  1 lvm  
  │   ├─pve-vm--101--disk--0  252:8    0     4M  0 lvm  
  │   └─pve-vm--101--disk--1  252:10   0   500G  0 lvm  
  └─pve-data_tdata            252:5    0 816.2G  0 lvm  
    └─pve-data-tpool          252:6    0 816.2G  0 lvm  
      ├─pve-data              252:7    0 816.2G  1 lvm  
      ├─pve-vm--101--disk--0  252:8    0     4M  0 lvm  
      └─pve-vm--101--disk--1  252:10   0   500G  0 lvm
 
smartctl -a /dev/[device]

Should output something. You seem to either have Samsung SSD 870 drives with Smart disabled or they are broken in some fashion. Your logs looks the same way that a broken Samsung SSD did to me.

Code:
SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 71 to 70

A bit too hot in my opinion.

Rgds, ..
 
smartctl -a /dev/[device]

Should output something. You seem to either have Samsung SSD 870 drives with Smart disabled or they are broken in some fashion. Your logs looks the same way that a broken Samsung SSD did to me.

Code:
SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 71 to 70

A bit too hot in my opinion.

Rgds, ..
Unfortunately that command doesnt get me anything, And it cant be the drives at this point, Ive replaced them too many times for it to just be the drives. I'll likely replace the PSU today and see if thats the issue. Additionally if heat is the issue I can space out the drives a bit more physically and maybe verify the fans are going?

Code:
root@Echoes:~# smartctl -a /dev/sdf -T permissive
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.4-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Short INQUIRY response, skip product id

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Read defect list: asked for grown list but didn't get it
Error Counter logging not supported

Device does not support Self Test logging
 
If a drive is too hot it will stop doing stuff including not be queryable by smart. Your log snippet included the line about smart attribute 190 before the drive starting to fail i/o commands.

Did you update the host recently?

There was some mentions about failing sata-drives on another thread after updating proxmox to a newer kernel version if I am not mistaken.
 
If a drive is too hot it will stop doing stuff including not be queryable by smart. Your log snippet included the line about smart attribute 190 before the drive starting to fail i/o commands.

Did you update the host recently?

There was some mentions about failing sata-drives on another thread after updating proxmox to a newer kernel version if I am not mistaken.

I did not conduct a manual update, both times were a "fresh" install from a USB. But I created the usb a bit ago so maybe an update was done automatically? I am currently running 6.8.4-2-pve, so it may be possible.

I am looking for that thread now, was the one below the one you mean?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!