I was migrating data between hosts. One of the lvm-thins was simply too small, and I noticed that post-factum when trying to start machine on external node:
This resulted on a target host with:
It seems that lvm-thin was marked as read-only when did run out of space to store more data,
but even that the Proxmox still consider this operation as finished.
Now, it would not be a big of the deal to recover data, but since both thin pools are on
nvme drive with TRIM, it means that there's no possibility to recover `lvm metadata`
Is this a problem with `dd` continuing to write data even though it clearly seems that write do fail?
Is it due to writes being `async`?
So, the outcome was:
- Proxmox considered data to be migrated, and it removed source disk
- Due to TRIM it is not recoverable anymore
Code:
2020-04-01 11:56:24 starting migration of VM 300 to node 'home-PC' (192.168.88.176)
2020-04-01 11:56:24 found local disk 'nvme-thin:vm-300-disk-0' (in current VM config)
2020-04-01 11:56:25 copying local disk images
WARNING: Thin volume nvme/vm-102-disk-3 maps 25977421824 while the size is only 12884901888.
WARNING: Thin volume nvme/vm-102-disk-3 maps 25977421824 while the size is only 12884901888.
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume "vm-300-disk-0" created.
WARNING: Sum of all thin volume sizes (552.01 GiB) exceeds the size of thin pool nvme/nvme-thin and the size of whole volume group (475.70 GiB).
524288+0 records in
524288+0 records out
34359738368 bytes (34 GB, 32 GiB) copied, 392.481 s, 87.5 MB/s
3268+1675423 records in
3268+1675423 records out
34359738368 bytes (34 GB, 32 GiB) copied, 392.842 s, 87.5 MB/s
Logical volume "vm-300-disk-0" successfully removed
2020-04-01 12:03:00 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=home-PC' root@192.168.88.176 pvesr set-state 300 \''{}'\'
2020-04-01 12:03:01 migration finished successfully (duration 00:06:37)
TASK OK
This resulted on a target host with:
Code:
[172610.719854] device-mapper: thin: 253:4: reached low water mark for data device: sending event.
[172610.751461] device-mapper: thin: 253:4: switching pool to out-of-data-space (queue IO) mode
[172672.737369] device-mapper: thin: 253:4: switching pool to out-of-data-space (error IO) mode
[172672.738755] Buffer I/O error on dev dm-34, logical block 5996032, lost async page write
[172672.740161] Buffer I/O error on dev dm-34, logical block 5996033, lost async page write
...
[/code
After fixing the lack of space:
[code]
[176160.552390] device-mapper: thin: 253:4: switching pool to write mode
It seems that lvm-thin was marked as read-only when did run out of space to store more data,
but even that the Proxmox still consider this operation as finished.
Now, it would not be a big of the deal to recover data, but since both thin pools are on
nvme drive with TRIM, it means that there's no possibility to recover `lvm metadata`
Is this a problem with `dd` continuing to write data even though it clearly seems that write do fail?
Is it due to writes being `async`?
So, the outcome was:
- Proxmox considered data to be migrated, and it removed source disk
- Due to TRIM it is not recoverable anymore