What about LVM+qcow2

carragom

Member
Apr 2, 2022
3
2
8
Hi,
This is an idea that makes a lot of sense to me, yet I'm not able to find people talking about it so it's probably a bad idea. Still I would love to know why it's bad. Before anyone asks, no I'm not planing on using this in production.

According to the docs PVE supports shared LVM storage when connected with iSCSI or FC which is great. But there is no snapshot support on it, and this makes sense since clustered LVM does not support snapshots, so far so good but... What if we can use an LV as a file in qcow2 format and let the format handle the snapshots?. AFIAK XenServer/XCP-ng does something similar with the vhd format.

So I decided to do some testing in the lab with a single PVE node and a local (not shared) LVM. And to my surprise I was able to do the following
  • Boot the VM with a manually modified lv+qcow2
  • Partition the disk into 2 partitions
  • mkfs.ext4 each partition
  • Write data to each with multiple files and directories
  • Take a snapshot
  • Add new files and dirs
  • Change existing file content
  • Revert to the snapshot previously taken
How did I do this?
  1. Create a VM with a disk in the LVM storage without starting it
  2. Enable the LV if needed with lvchange -ay /dev/vg1/vm-100-disk-0
  3. Format the LV with qemu-img create -f qcow2 /dev/vd1/vm-100-disk-0 32G
  4. Use qm show 100 to see the command that proxmox would use to start the VM
  5. Change the value from format=raw to format=qcow2 in the -drive parameter
  6. Manually start the VM with the modified command
At this point the web ui will show the VM booting, hopefully with a live cd where you can start changing things in the disk. Every time the VM is shutdown, the LV gets deactivated so use step 2 in the previous list to activate it so you can start the vm again or play with snapshots.
To manage the snapshots I used
  1. Create snapshot qemu-img snapshot -c snap1 /dev/vd1/vm-100-disk-0
  2. List the snapshots with qemu-img snapshot -l /dev/vd1/vm-100-disk-0
  3. Revert snapshot qemu-img snapshot -a snap1 /dev/vd1/vm-100-disk-0
  4. Delete a snapshot qemu-img snapshot -d snap1 /dev/vd1/vm-100-disk-0
The desired end here would be to have multi-path enabled iscsi with shared LVM and snapshots like XenServer/XCP. Posibly better because these would be thin snapshots. But again, all this sounds too good to be true so could anyone tell me why this is a bad idea?
 
I think it is great idea! It should work and be perfect solution for SAN environment (ideal vmware replacement). Actually I have found this article which is about implementing kubernetes CSI driver on SAN using qcow2 on LVM. They mention this document about how this is implemented in ovirt/vdsm which have snapshots on SAN. So this solution is already used in other virtualization solution. Vdsm is doing thin provision, but to keep it easy, it is not needed, just to have snapshots on LVM. Thin provison and dedup can be done on SAN storage side. Can somebody from Proxmox look at this?
 
Last edited:
Thin provison [...] can be done on SAN storage side.
Despite what all vendors advertise ... without proper guest support ist just a plain and simple no, they're lying. Once the data is written and deleted but not unmapped from the guest disk, you will end up with an almost thick-provisioned volume. Just had this case with a 6/7-figure 3PAR.
 
Despite what all vendors advertise ... without proper guest support ist just a plain and simple no, they're lying. Once the data is written and deleted but not unmapped from the guest disk, you will end up with an almost thick-provisioned volume. Just haEd this case with a 6/7-figure 3PAR.
I have experience with larger vmware env backed by multiple all flash storages. It works well, also dedup.
 
I think it is great idea! It should work and be perfect solution for SAN environment (ideal vmware replacement). Actually I have found this article which is about implementing kubernetes CSI driver on SAN using qcow2 on LVM. They mention this document about how this is implemented in ovirt/vdsm which have snapshots on SAN. So this solution is already used in other virtualization solution. Vdsm is doing thin provision, but to keep it easy, it is not needed, just to have snapshots on LVM. Thin provison and dedup can be done on SAN storage side. Can somebody from Proxmox look at this?
Glad to see I'm not the only one that thinks there is value behind this. Thanks for the links, good to know others are trying to work on this. To be honest I had almost forgotten about this, It's been 4 months without a reply after all. My current setup for multi-path iSCSI uses OCFS2 and I would love to be rid of that unnecessary layer.
 
Out of topic, but how is ocfs2 with qcow2 performing? And it is reliable? I can't find any benchmarks or tutorials and my past experience (10+ years) with ocfs2 is only bad (kernel panics).
 
Pretty solid for the past couple of years on a v7 six-node cluster with around 200 VMs. I did encounter a kernel panic once, it was fixed pretty quickly with an update. Also there is a problem with v8 where the VM won't start if you use Async IO = io_uring on the hard drive. Maybe it's fixed by now, have not tested it lately. Native and threads options work fine. As for performance, it feels pretty good and since NFS is out of the question for me because it lacks multi-path support, I guess the only other option to compare it to would be GFS2, I might try that someday. Management is a bit clunky though, that's why I would love to have shared LVM with snapshots, even if the snapshots are thick.
 
  • Like
Reactions: spirit and iwik
I think it is great idea! It should work and be perfect solution for SAN environment (ideal vmware replacement). Actually I have found this article which is about implementing kubernetes CSI driver on SAN using qcow2 on LVM. They mention this document about how this is implemented in ovirt/vdsm which have snapshots on SAN. So this solution is already used in other virtualization solution. Vdsm is doing thin provision, but to keep it easy, it is not needed, just to have snapshots on LVM. Thin provison and dedup can be done on SAN storage side. Can somebody from Proxmox look at this?
Hi, I have also looked at this.
I'll try it soon, and compare vs gfs2 && ocfs2.

The only diff vs ovirt, is that proxmox use internal snapshot, I don't known it's playing fine with qcow2 on blockdevice. (That's also mean than we can't shrink blockdevice after delete of the snapshot)

https://bugzilla.proxmox.com/show_bug.cgi?id=4160
 
Hi, I have also looked at this.
I'll try it soon, and compare vs gfs2 && ocfs2.

The only diff vs ovirt, is that proxmox use internal snapshot, I don't known it's playing fine with qcow2 on blockdevice. (That's also mean than we can't shrink blockdevice after delete of the snapshot)

https://bugzilla.proxmox.com/show_bug.cgi?id=4160
Hi spirit,

your qcow2 over LVM implementation looks promising - esp. the qmeventd daemon way sounds like a smooth solution - I think this is the missing piece for an all-in-one >host based< shared block storage / SAN solution with similiar features like ESXi/VMFS - I hope this will then be also part of the Proxmox distribution in future. Thanks !

P.S. only because of technical interest - not sure if this would have made sense if the qcow2 over LVM wouldn't have worked:
- use a (shared) LVM thin pool per virtual disk - due the thin data and metadata/sparse LVs are essentially thick LVs, Proxmox could internally use the same locking mechanism like it does currently for LV raw disks
- create initially e.g. 1GB meta + 1GB data thinpool + x GB thin prov. vol within (lock/assign the corresponding LVs to node)
- use Thin Pool Automatic Extension with dmeventd + thin_pool_autoextend_threshold / thin_pool_autoextend_percent in lvm.conf
 
Hi spirit,

your qcow2 over LVM implementation looks promising - esp. the qmeventd daemon way sounds like a smooth solution - I think this is the missing piece for an all-in-one >host based< shared block storage / SAN solution with similiar features like ESXi/VMFS - I hope this will then be also part of the Proxmox distribution in future. Thanks !
I'm trying to upstream all my work as much as possible. (because I won't to maintain a fork on my side).
I have a lot of customers coming from vmware (thanks broadcom), with existing SAN, and it's really a blocker currently.
P.S. only because of technical interest - not sure if this would have made sense if the qcow2 over LVM wouldn't have worked:
- use a (shared) LVM thin pool per virtual disk - due the thin data and metadata/sparse LVs are essentially thick LVs, Proxmox could internally use the same locking mechanism like it does currently for LV raw disks
- create initially e.g. 1GB meta + 1GB data thinpool + x GB thin prov. vol within (lock/assign the corresponding LVs to node)
- use Thin Pool Automatic Extension with dmeventd + thin_pool_autoextend_threshold / thin_pool_autoextend_percent in lvm.conf
interesting, I didn't have thinked about this. I'll try to implement it to compare.
 
  • Like
Reactions: _gabriel