[SOLVED] VMs on lvm-thin storage get corrupted

RodinM

Renowned Member
Aug 1, 2011
81
0
71
Hello,
I have a strange situation with my single node installation.
I have a free space on the proxmox install disk /dev/nvme0n1:
Bash:
lsblk
...
nvme0n1                            259:0    0 476.9G  0 disk
|-nvme0n1p1                        259:1    0  1007K  0 part
|-nvme0n1p2                        259:2    0     1G  0 part /boot/efi
|-nvme0n1p3                        259:3    0    14G  0 part
| |-pve-swap                       252:0    0     2G  0 lvm  [SWAP]
| `-pve-root                       252:1    0    12G  0 lvm  /
...

Step 1
I decide to use the free space as lvm-thin storage and create a new partition nvme0n1p4 with parted (fdisk, whatever...)
Then:
Bash:
# pvcreate /dev/nvme0n1p4
# vgcreate vg_local /dev/nvme0n1p4
# lvcreate -L 400G -n lv_thin_local vg_local
# lvconvert --type thin-pool vg_local/lv_thin_local
# lvresize -l +100%FREE /dev/vg_local/lv_thin_local
Everything just like Proxmox official docs recommend: https://pve.proxmox.com/wiki/Storage:_LVM_Thin
And that is fine, I add this as lvm-thin storage to proxmox and use it as VM storage with no problems:
I restore some of my machines to this storage from a backup and they start successfully.

Step 2
Then I ask myself, why I do not make lvm-thin pool with one command and instead of the previous command list I do this (after deleting lv, vg, and pv from /dev/nvme0n1p4):
Bash:
# pvcreate /dev/nvme0n1p4
# vgcreate vg_local /dev/nvme0n1p4
# lvcreate -Zn -l100%FREE --thinpool lv_thin_local vg_local

And again I add this as lvm-thin storage to proxmox and use it as VM storage. But after restoring some VMs to this storage from a backup they start with many problems as if their virtual disks were heavily damaged or cannot start at all because of damaged bootloader be it linux or windows.

Step 3
I decide to do another experiment and attach an iscsi volume shared from an openmediavault which is seen on proxmox as /dev/sda:
I make a single partition /dev/sda1 then again I make an lvm thin pool with one command,:
Bash:
# pvcreate /dev/sda1
# vgcreate vg_iscsi /dev/sda1
# lvcreate -Zn -l100%FREE --thinpool lv_thin_iscsi vg_iscsi
I add this as another lvm-thin storage to proxmox and use it as VM storage.
I restore some of my machines to this storage from a backup and they start successfully. No evidence of virtual disk corruption on any vm.

What is wrong with the local disk and lv_thin_local made on it with one command?
If it were a problem with the physical storage it would appear after any way of creating the lvm-thin storage be it Step 1 or Step 2
What is the critical difference between creating lvm-thin pool immediately and creating lvm pool and then converting it to thin-pool?
 
Let me go on a tangent and ask why your setup does not have the default data thin pool as shown here: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_lvm
In your case I'd increase nvme0n1p3 and create the original data thin pool there in the existing pve group with some free space left over in the VG.
This is sort of test lab so during proxmox installation I chose hdsize 15GB to try to limit the storage size proxmox takes by itself. IIRC, during the install proxmox said that it would not create data partition because of too small hdsize.
According to wiki:
hdsize
Defines the total hard disk size to be used. This way you can reserve free space on the hard disk for further partitioning (for example for an additional PV and VG on the same hard disk that can be used for LVM storage).

Because I did not plan to use data for any tasks that did not seem a problem to me.
I do not pretend to have done everything correctly, but if I did a critical mistake please explain why the data pool is so important.
 
Please don't quote messages without good reason.
The data pool is where the virtual disks for your guests are supposed to be stored. You should have modified maxroot instead. With ZFS as a file system you have access to the whole disk space with each storage. With LVM(-Thin) it's separated like that.
 
Last edited:
you could compare the created LVs, in particular the metadata ones - maybe the sizing is off in your case #2, and the corruption is the result of running out of metadata space?
 
Thank you.
But I still do not understand why limiting hdsize is a bad option while it exactly does what I need?
 
you could compare the created LVs, in particular the metadata ones - maybe the sizing is off in your case #2, and the corruption is the result of running out of metadata space?
Thank you for the hint! I will repeat step 2 and check metadata usage.
 
you could compare the created LVs, in particular the metadata ones - maybe the sizing is off in your case #2, and the corruption is the result of running out of metadata space?
So, after repeating all the steps in Step 2 and restoring a VM from backup to the new lvm-thin storage (where the vm is corrupted again) I see the following:
Bash:
~# lvdisplay '/dev/vg_local/lv_thin_local'
  --- Logical volume ---
  LV Name                lv_thin_local
  VG Name                vg_local
  LV UUID                qtfszg-j442-B3YM-aw7B-0h51-7INH-RK05b3
  LV Write Access        read/write (activated read only)
  LV Creation host, time hv01, 2025-08-06 13:27:20 +0300
  LV Pool metadata       lv_thin_local_tmeta
  LV Pool data           lv_thin_local_tdata
  LV Status              available
  # open                 0
  LV Size                438.62 GiB
  Allocated pool data    0.18%
  Allocated metadata     10.47%
  Current LE             112287
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           252:5

I compare this to lvm-thin pool made on iscsi target with Step 3 (which also has VMs restored from the same backup):
Bash:
# lvdisplay '/dev/vg_iscsi/lv_thin_iscsi'
  --- Logical volume ---
  LV Name                lv_thin_iscsi
  VG Name                vg_iscsi
  LV UUID                NWkHDx-Jo6g-VYBC-bbX6-ok14-VJx6-J2Xq1S
  LV Write Access        read/write (activated read only)
  LV Creation host, time hv01, 2025-08-03 11:36:14 +0300
  LV Pool metadata       lv_thin_iscsi_tmeta
  LV Pool data           lv_thin_iscsi_tdata
  LV Status              available
  # open                 0
  LV Size                <1.37 TiB
  Allocated pool data    6.18%
  Allocated metadata     12.07%
  Current LE             358355
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           252:14

And I do not see anything different in both cases.
Should I check anything else?
 
sorry I missed something rather fundamental in your initial post:

"# lvcreate -L 400G -n lv_thin_local vg_local"

vs

"lvcreate -Zn -l100%FREE --thinpool lv_thin_local vg_local"

could you try leaving out the "-Zn" and repeat your experiment?
 
sorry I missed something rather fundamental in your initial post:

"# lvcreate -L 400G -n lv_thin_local vg_local"

vs

"lvcreate -Zn -l100%FREE --thinpool lv_thin_local vg_local"

could you try leaving out the "-Zn" and repeat your experiment?
Yes, after leaving out the -Zn option the problem seems to disappear. I remember that I added this option because there was an advice somewhere that this can speed up the provisioning of the big pool which now seems a bit odd to me when I again read the man page about this option.

As the man page says:
Code:
Controls  zeroing  of  the  first 4KiB of data in the new LV.  Default is y.  Snapshot COW volumes are always zeroed.  For
              thin pools, this controls zeroing of provisioned blocks.  LV is not zeroed if the read only flag is set.  Warning:  trying
              to mount an unzeroed LV can cause the system to hang.

This info did not make me very suspicious about the consequences... May be I misunderstood "zero of the first 4KiB" as "zero out the whole thin-pool" which might sound reasonable.
Thank you very much for your help!
Is it really so critical to zero the first 4KiB of data in the new thin pool?
 
the important part is

"For thin pools, this controls zeroing of provisioned blocks."

without it, if the disk isn't full of zeroes to being with (or the underlying storage returns all-zeroes for unwritten blocks), you will read back garbage and depending on what is doing the reading, that might be interpreted as or cause corruption.
 
the important part is

"For thin pools, this controls zeroing of provisioned blocks."

without it, if the disk isn't full of zeroes to being with (or the underlying storage returns all-zeroes for unwritten blocks), you will read back garbage and depending on what is doing the reading, that might be interpreted as or cause corruption.
I see. Thank you very much again for your comprehensive explanations!