zeroizable LVM blocks on shared storage

LnxBil

Distinguished Member
Feb 21, 2015
8,754
1,383
273
Saarland, Germany
Hi,

I optimize all my VMs to fill their free space with zeros before backup, such that the backup files are as small as possible and I do this semi-manually (scripted and cronjobed), but this is not automatic for me. I know that you can use discard, but only if the underlying storage supports it. I used it with qcow and even LVM on SSD, but I have a cluster and a FC-based SAN for my VMs.

The KVM manpages stated this:

discard=discard
discard is one of "ignore" (or "off") or "unmap" (or "on") and controls whether discard (also known as trim or unmap) requests are ignored or passed to the filesystem. Some machine types may not support discard requests.
...

detect-zeroes=detect-zeroes
detect-zeroes is "off", "on" or "unmap" and enables the automatic conversion of plain zero writes by the OS to driver specific optimized zero write commands. You may even choose "unmap" if discard is set to "unmap" to allow a zero write to be converted to an UNMAP operation.

What else is there? This does not apply to my problem. I want to clear/zeroize the blocks on LVM if the file is discarded in the VM.
 
detect-zeroes is enable by default by proxmox.

discard is enabled if you check the discard option on disk. you need also to use virtio-scsi controller. (not virtio-blk).

But I'm not sure that discard is working on lvm
 
Hi spirit,

Thank your for your answer, but this is not what I asked. I know that discard is not working on LVM, that's why I asked the question.

I dug further and found information on BLKZEROOUT, which should do this but did not find any use case, tutorial or even manpage entry for this. I know that it's in the qemu source.

I dig further
 
There's also issue_discards in /etc/lvm/lvm.conf.
Other than that I don't know. Thing is, you need this to happen on every level. Eg if you delete a file and the OS only issues a discard, then that's all qemu does, too. Only if the operating system also zeroes out the blocks will they be zeroed out on the host (and deleting a file usually doesn't do that...).
 
Hi Wolfgang,

Yes, but issue_discards relies on physical support of TRIM, which can be verified via hdparm. The option also only applies to LVM operations like reduce and remove.

QEMU is able to perform an unmap and newer versions also support BLKZEROOUT (at least in the source code, but I didn't had time to dig into it), which is exactly what I want. I think this has been introduced for backend storage which optimizes zero writes to unmap.

BTW: I use LVM on MDADM with TRIM support inside KVM-VM. This works admirably.
 
In the end it boils down to the fact that your guest has to actually zero out the deleted files, discard and qemu's BLKZEROOUT don't have much to do with that (they're just optimizations).
Maybe zerofree or s-fill from the secure-delete packages are of interest to you. (Both are available on debian...)
 
Hi Wolfgang,

I use dd for that because I need to clean out data while the data is used. I cannot unmount.

Maybe I look into providing such functionality to qemu. I really like the idea of mounting the ext4 filesystems with discard and QEMU does the magic. QEMU needs to provide TRIM to the virtual block layer and zero out it itself. That should not be that complicated, shouldn't it?

Best,
LnxBil
 
If you're willing to work on the qemu code I suppose you can make qemu zero-on-discard for those block drivers which don't already do so. Would have to be added to each block backend separately I suspect. It already keeps track of whether a device can discard and whether that zeroes out blocks to provide that information to the guest AFAIK. So it might be possible to add an option to force discard to be enabled and turn it into explicit zeroing where necessary.
It's probably best to take this to the qemu-devel mailing list.
 
Hi, and sorry to hijack this thread a bit, but I feel my question is about approximately the same thing: I'm trying to take advantage of detect-zeroes on an LVM-Thin volume, with no success. In most VMs, I can issue discards inside the VM and free some space (this works). But I have one Windows VM and one very old debian (Debian 3, yeah I know, but I have no choice for this) that cannot issue discards.

Running fstrim in a VM works perfectly when this VM can issue discards… fstrim with a recent Linux for instance. Space is freed and given back to LVM-Thin. But I cannot manage to get my space back by writing zeroes, and I thought, reading this thread, that it might work. Am I missing something ?

Regards
 
I never used thin-provisioned LVM, but I can write zeros to ZFS and ZFS takes care of the unmapping. Maybe there is something similar for thin-provisioned LVMs. Maybe when LVM extends are free, then can be reclaimed.
 
If that's the case, it means that KVM sends discards to ZFS when it detects zeroes, but not to LVM-Thin. That would be strange, wouldn't it ? Or that ZFS detects the zeroes by itself ?
 
No, if you write zeros to ZFS, ZFS detects this and unmaps it automatically. This does not involve trim/discard at all. I do not know if thin-LVM does something similar.

ZFS uses normally 128K recordsize/blocksize, you can set it to 8K to match it to your guest (if aligned properly) and then get 1:1 mapping of blocks - so every freed block can be unmapped by ZFS if it has now a reference count of 0.

LVM uses internally extends and I do not know if thin provisioned volumes can also be configured with a small extend size. You have to look this up in the thin-LVM documentation or try with a thin-provisioned test-volume.
 
I thought that if detect-zeroes=unmap was set in kvm's options, it would send discards when it detects zeroes. This would be sufficient for LVM-Thin to do the discards… as it works when the VM's system is really doing discards.
 
I thought that if detect-zeroes=unmap was set in kvm's options, it would send discards when it detects zeroes. This would be sufficient for LVM-Thin to do the discards… as it works when the VM's system is really doing discards.

I stated that the filesystem can do the unmapping itself. If KVM does it via trim, that's of course faster, but you do not need KVM for that. If you write a big file like an uncompressed VM disk - which contains a lot of zeros - the file is written to ZFS, but it unmaps it automatically, yielding a very slim VM disk image.
 
lvm-thin in the kernel source marks the device as not returning zeroes for discarded blocks.
In practice it will depend on the discarded size. If you discard a large enough portion for LVM to deallocate it the subsequent reads do seem to return zero, but this is not guaranteed on a block level, so `detect-zeroes=unmap` won't use discard here.

Generally `detect-zeroes=unmap` will make qemu query the kernel about whether the the underlying storage reports that doing a discard causes subsequent reads from that location to return zero. So usually it'll fall back to using a BLKZEROOUT request, then it depends on what else the host kernel knows about the underlying storage: it again can be a discard (but at this point it's unlikely otherwise qemu would have already used that), a single zero-block write followed by "write-same" requests (most likely at this point), or if the former doesn't zero-out and the latter isn't supported it ends up doing actual writes of zeroes. The last two will not issue discards at all.
 
/etc/lvm/lvm.conf appears to indicate that 'thin_pool_zero' is enabled by default. Running 'lvs' shows 'z' attribute for thinly provisioned logical volumes but eg 'cat /sys/block/dm-34/queue/discard_zeroes_data' indicates that the kernel doesn't honour or know this.

I can't find any posts relating to this being discussed anywhere. Is this by design (perhaps due to LVM-thin not zeroing data atomically) or a case that the required kernel code hasn't been written yet?

We use the following relatively simple bash script to update block devices (VM discs) to servers in a disaster recovery site. The script essentially reads and compares 1KB block MD5 hashes and then either skips to the next block or transmits that block. Works very well with snapshots at both the source (consistent source image) and destination (last snapshot is consistent image whilst origin is patched).

We were hoping to find a way of getting LVM-thin to discard blocks of zeros to further save storage at the DR site, although this doesn't appear to exist. Guests use discard mount options or run Disc defragmentation with TRIM (Windows) at the source server, but our replication scheme ends up writing zeros at the destination if that block previously contained data, so the destination images slowly grow over time.

Would be ideal if LVM-thin could detect extents containing only zeros and unmap them automatically, or if we could periodically process the block devices to achieve this. We currently periodically launch the VMs in the DR environment and run fstrim or disc defragmentation manually but would love this to be automatic.

PS: Reading zeros from unallocated space with LVM-thin is also very quick as it doesn't actually need to read any data.

Herewith the script we run nightly on a Proxmox system in a remote office, hope it's useful to someone (uses 'lzop' to compress hashes and data in transit):

PS: The script below is from another legacy host which still uses standard LVM (no thin volumes). Altering this is trivial though, as it only requires a slight adjustment to the lvcreate command.

Code:
#!/bin/sh

network_kvm_backup () {
  src_vg=$1;
  src_lvm=$2;
  snapsize=$3;
  dst_host=$4;
  dst_vg=$5;
  export dev1="/dev/$src_vg/$src_lvm-snap1";
  export dev2="/dev/$dst_vg/$src_lvm-backup";
  export remote="root@$dst_host";

  logger "Starting to update $dev1 to $dst_host as $dev2";
  [ "$src_vg" = "vg_kvm" ] && stripes=2 || stripes=1;
  lvcreate -i $stripes -L $snapsize /dev/$src_vg/$src_lvm -s -n $dev1 > /dev/null;
  ssh -i /root/.ssh/rsync_rsa $remote "
    perl -'MDigest::MD5 md5' -ne 'BEGIN{\$/=\1024};print md5(\$_)' $dev2 | lzop -c" |
    lzop -dc | perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\1024};$b=md5($_);
      read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 | lzop -c |
    ssh -i /root/.ssh/rsync_rsa $remote "lzop -dc |
     perl -ne 'BEGIN{\$/=\1} if (\$_ eq\"s\") {\$s++} else {if (\$s) {
      seek STDOUT,\$s*1024,1; \$s=0}; read ARGV,\$buf,1024; print \$buf}' 1<> $dev2"
  logger "Finished updating $dev1 to $dst_host as $dev2";
  lvremove -f $dev1 > /dev/null;
}

#                  src_vg src_lvm             snapsize dst_host           dst_vg
network_kvm_backup vg_kvm lair-nt01             50G    dr2.lair.co.za lvm0
network_kvm_backup vg_kvm lair-eppdns            5G    dr2.lair.co.za lvm0
network_kvm_backup vg_kvm lair-webapp            5G    dr3.lair.co.za lvm0

lvm-thin in the kernel source marks the device as not returning zeroes for discarded blocks.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!