Cannot access thin pool anymore

Faberix

New Member
Jul 5, 2024
1
0
1
Hi

I have a problem: I cannot access anything inside a thin pool. For some time now, the sum of the volume sizes exceeded the pool capacity because of snapshots and I needed to deactivate data_tmeta and data_tdata and then manually activate data to use it, as described in
https://forum.proxmox.com/threads/t...gical-volume-pve-data_tdata-is-active.106225/.
Now, I can still deactivate data_tmeta and data_tdata, but running
Bash:
lvchange -ay vmdataHDD/data
first hangs for a while (and I can hear that the HDD is busy during that time) and then it fails with the following error:
Code:
Check of pool vmdataHDD/data failed (status:1). Manual repair required!

I think this might be because the pool is now finally full, or maybe I misconfigured the sizes in the beginning (i.e. the pool size might be bigger than what's available on the partition), but I don't remember what I did when I set that up.

I also checked the logs, and when it started going awry, there are a lot of messages of this kind:
Code:
Jun 30 13:09:31 pve kernel: ata14.00: exception Emask 0x0 SAct 0x80780007 SErr 0x0 action 0x0
Jun 30 13:09:31 pve kernel: ata14.00: irq_stat 0x40000008
Jun 30 13:09:31 pve kernel: ata14.00: failed command: WRITE FPDMA QUEUED
Jun 30 13:09:31 pve kernel: ata14.00: cmd 61/08:98:c0:f6:18/00:00:39:01:00/40 tag 19 ncq dma 4096 out
                                     res 41/10:00:c0:f6:18/00:00:39:01:00/00 Emask 0x481 (invalid argument) <F>
Jun 30 13:09:31 pve kernel: ata14.00: status: { DRDY ERR }
Jun 30 13:09:31 pve kernel: ata14.00: error: { IDNF }
Jun 30 20:42:22 pve kernel: ata14.00: exception Emask 0x0 SAct 0xe000 SErr 0x0 action 0x0
Jun 30 20:42:22 pve kernel: ata14.00: irq_stat 0x40000008
Jun 30 20:42:22 pve kernel: ata14.00: failed command: WRITE FPDMA QUEUED
Jun 30 20:42:22 pve kernel: ata14.00: cmd 61/00:68:00:78:59/04:00:39:01:00/40 tag 13 ncq dma 524288 out
                                     res 41/10:00:00:78:59/00:00:39:01:00/00 Emask 0x481 (invalid argument) <F>

I also checked the drive health using smartctl (long test), and the hardware seems to be fine. I can also still use another partition on it.

What should I do now? I have backups of all the important data, but I hope I don't need to use them.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!