Data-loss when using `lvm-thin`

ayufan

Renowned Member
Aug 26, 2012
20
17
68
I was migrating data between hosts. One of the lvm-thins was simply too small, and I noticed that post-factum when trying to start machine on external node:

Code:
2020-04-01 11:56:24 starting migration of VM 300 to node 'home-PC' (192.168.88.176)
2020-04-01 11:56:24 found local disk 'nvme-thin:vm-300-disk-0' (in current VM config)
2020-04-01 11:56:25 copying local disk images
WARNING: Thin volume nvme/vm-102-disk-3 maps 25977421824 while the size is only 12884901888.
WARNING: Thin volume nvme/vm-102-disk-3 maps 25977421824 while the size is only 12884901888.
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume "vm-300-disk-0" created.
WARNING: Sum of all thin volume sizes (552.01 GiB) exceeds the size of thin pool nvme/nvme-thin and the size of whole volume group (475.70 GiB).
524288+0 records in
524288+0 records out
34359738368 bytes (34 GB, 32 GiB) copied, 392.481 s, 87.5 MB/s
3268+1675423 records in
3268+1675423 records out
34359738368 bytes (34 GB, 32 GiB) copied, 392.842 s, 87.5 MB/s
Logical volume "vm-300-disk-0" successfully removed
2020-04-01 12:03:00 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=home-PC' root@192.168.88.176 pvesr set-state 300 \''{}'\'
2020-04-01 12:03:01 migration finished successfully (duration 00:06:37)
TASK OK

This resulted on a target host with:

Code:
[172610.719854] device-mapper: thin: 253:4: reached low water mark for data device: sending event.
[172610.751461] device-mapper: thin: 253:4: switching pool to out-of-data-space (queue IO) mode
[172672.737369] device-mapper: thin: 253:4: switching pool to out-of-data-space (error IO) mode
[172672.738755] Buffer I/O error on dev dm-34, logical block 5996032, lost async page write
[172672.740161] Buffer I/O error on dev dm-34, logical block 5996033, lost async page write
...
[/code

After fixing the lack of space:

[code]
[176160.552390] device-mapper: thin: 253:4: switching pool to write mode

It seems that lvm-thin was marked as read-only when did run out of space to store more data,
but even that the Proxmox still consider this operation as finished.

Now, it would not be a big of the deal to recover data, but since both thin pools are on
nvme drive with TRIM, it means that there's no possibility to recover `lvm metadata` :(

Is this a problem with `dd` continuing to write data even though it clearly seems that write do fail?
Is it due to writes being `async`?

So, the outcome was:
- Proxmox considered data to be migrated, and it removed source disk
- Due to TRIM it is not recoverable anymore
 
  • Like
Reactions: kwinz
Yeah, it could have been bad timing. What timezone is the server running in? For the epoch.
 
I forgot to mention, please also provide the log snippet with the time. Thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!