device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -5

sender

Member
Apr 9, 2021
57
0
11
47
Hello, and welcome me to the forums :). I am a recently joined proxmox user and just joined forum user.
I would like to have some support.

I am relatively new to this so do not shoot me for giving too less information :). I run this machine now for around 5-6 weeks and most of the time it runs fine. But lately (now about ~8 times) is just is totally unresponsive.

All VMs down, LXC, down, proxmox management interface (https://ip:8006) down, ssh down and I get a load of message on the screen.

It is just a basic out of the box setup, nothing special (as far as I am aware) and I just want to run some VMs/LXCs.

The error I tried to google but does not give me any concrete hint to solve my issue.

Given
Intel NUC8i5
Samsung SSD980EVO PRO NVME
16GB ram
1 VM (4vCPU, 8GB, 64GB disk) - 2 USB sticks passthrough
2 LXC (2vCPU, 512MB, 8GB disk)

pve-manager/6.3-6/2184247e (running kernel: 5.4.106-1-pve)

Host LVM screenshot:
1617951375231.png

Host LVM-Thin screenshot:
1617951408230.png
The issue:
1617951508859.png
 
Last edited:
Can you post the output of the lvs and lsblk CLI commands?
 
Code:
root@proxmox01:~# lvs
  LV            VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <338.36g             9.92   0.95                           
  root          pve -wi-ao----   96.00g                                                   
  swap          pve -wi-ao----    8.00g                                                   
  vm-100-disk-0 pve Vwi-aotz--    4.00m data        0.00                                   
  vm-100-disk-1 pve Vwi-aotz--   64.00g data        24.80                                 
  vm-101-disk-0 pve Vwi-a-tz--   32.00g data        44.26                                 
  vm-102-disk-0 pve Vwi-aotz--    8.00g data        25.45                                 
  vm-103-disk-0 pve Vwi-aotz--    8.00g data        18.50

Code:
root@proxmox01:~# lsblk
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme0n1                      259:0    0 465.8G  0 disk
├─nvme0n1p1                  259:1    0  1007K  0 part
├─nvme0n1p2                  259:2    0   512M  0 part /boot/efi
└─nvme0n1p3                  259:3    0 465.3G  0 part
  ├─pve-swap                 253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0    96G  0 lvm  /
  ├─pve-data_tmeta           253:2    0   3.5G  0 lvm 
  │ └─pve-data-tpool         253:4    0 338.4G  0 lvm 
  │   ├─pve-data             253:5    0 338.4G  0 lvm 
  │   ├─pve-vm--101--disk--0 253:6    0    32G  0 lvm 
  │   ├─pve-vm--100--disk--0 253:7    0     4M  0 lvm 
  │   ├─pve-vm--100--disk--1 253:8    0    64G  0 lvm 
  │   ├─pve-vm--102--disk--0 253:9    0     8G  0 lvm 
  │   └─pve-vm--103--disk--0 253:10   0     8G  0 lvm 
  └─pve-data_tdata           253:3    0 338.4G  0 lvm 
    └─pve-data-tpool         253:4    0 338.4G  0 lvm 
      ├─pve-data             253:5    0 338.4G  0 lvm 
      ├─pve-vm--101--disk--0 253:6    0    32G  0 lvm 
      ├─pve-vm--100--disk--0 253:7    0     4M  0 lvm 
      ├─pve-vm--100--disk--1 253:8    0    64G  0 lvm 
      ├─pve-vm--102--disk--0 253:9    0     8G  0 lvm 
      └─pve-vm--103--disk--0 253:10   0     8G  0 lvm
 
FWIW... it's now again working for over 24h... but like last times it will crash again... but when... again. Please continue to assist me :)
 
Ok, it is now crashed again. This time I have no image on hdmi out at all. I am able to ping the device on management interface.

All other communication is lost.

PLEASE HELP ME how to troubleshoot this!
 
Do you see these messages in /var/log/syslog? Are there other messages that could point to the root cause?
In another thread the root cause for the similar error was a problematic RAID controller firmware.
 
Ho do I get these?
Either via the CLI or if you prefer the GUI: Node -> system -> syslog
In the GUI you can limit the time span you are interested in.

Are you saying this could be a hardware issue?
It's possible. That's why it would be good to check if these errors are in the syslog, and if there are other errors shortly before that might indicate a hardware problem.
 
Then I do not see these messages in the logs...

At least, not now. I can send the entire log in a PM if you whish. And should there be any of these errors in the log after a restart (power off/on)?

I a did not have the issue this morning, but it is just a matter of time. It seems like if there is a lot of "IO" or "usage witout IO", not sure what it happens and crashes...

Anything that helps I will do.
 
Please help me, I am desperate. It worked for around 2 days and now down again. Portscan shows me still activity on port 111 rpc. Ping possible, rest down... help!
 
you can do a smartctl - /dev/<the disk> to see if there is something reported there. If there is nothing it does not mean that anything is good.
Other than that, we need more infos. If the problem happens again, check in the syslogs where that lvm thin error occured and go further back in time and check if there are other errors reported. If you use the GUI (<node>->system->syslog) you can specify a time frame to narrow down the list.
 
Hi, thank you. I have no errors in the syslog of these only on the screen. I cannot log in the gui or anything else and cannot see any logs. also when repowering I do not see any of these messages in the gui log.

I do not know how to use the command. I tried these without luck:
Code:
smartctl -- /dev/nvme0n1p3
smartctl - /dev/nvme0n1p3
smartctl - /nvme0n1p3
 
I think I may have this same issue. Is it still working for you? I have a few questions if you get a chance.

1) is it still working? (mine seems to die randomly every few days)
2) It's going to wipe the drive I'm guessing? (how did you back up your VM's?
3) I have an 870 samsung pro, but I dont see it in the list of available firmware, was your's in the list?db69c336-904f-47e2-b924-220fc904c84f.png
 
Last edited:
1) is it still working? (mine seems to die randomly every few days)
Yes, that works well.

2) It's going to wipe the drive I'm guessing? (how did you back up your VM's?
No, it's only the firmware of the drive. However Samsung advises to backup the data :)

3) I have an 870 samsung pro, but I dont see it in the list of available firmware, was your's in the list?
I have a 980 pro, it was listed here:
https://www.samsung.com/semiconductor/minisite/ssd/download/tools/

I don't see a 970 pro but I do see a 970 plus...?

sorry I can't be of any more help here.
 
Yes, that works well.


No, it's only the firmware of the drive. However Samsung advises to backup the data :)


I have a 980 pro, it was listed here:
https://www.samsung.com/semiconductor/minisite/ssd/download/tools/

I don't see a 970 pro but I do see a 970 plus...?

sorry I can't be of any more help here.
thanks so the firmware update, did not wipe your drive? Did you have to download the firmware, or did it detect your device and tell you what firmware it was going to install? the 870 pro isnt listed, but thought maybe if I hook it up and it see's it, it will have an update for it :(
 
Yes, it did not whipe it.


Yes.

The firmware update was very cumbersome. I used an external usb stick and had to try and format in various ways before is was working.
Thanks, I guess since there is no firmware for my SSD, I may be out of luck. going to have to figure out how to migrate to a new boot drive.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!