last missing enterprise feature (on shared block storage / SAN)

alma21

New Member
May 4, 2024
14
1
3
Hi,

I know that there is already "LVM over iSCSI/FC" for shared block storage (SAN) which carves out LV from a shared VG/LUN for individual VM disks - but with no snapshot and thin provisioning support

my question is why is there no (extended) storage plugin which provides

1.) "QCOW2 over/on LV(M)" - to format a LV / block device with QCOW2 should not be a big issue - given that the "raw" format is already used with LVM / Proxmox Storage plugins - this would then provide qcow internal snapshots on an (initially) thick provisioned LV

2.) the thin provisioning is maybe the harder part - but oVirt/RHV did this already with some kind of watchdog - if an initial small (e.g. 1GB) thick QCOW LV gets full -> extend it with +x GB
 
Last edited:
my question is why is there no (extended) storage plugin which provides
The short answer is - nobody volunteered their time to write and submit the code. If PVE developers looked at it, then given it was not delivered yet, it was not a high enough priority. Or there were technical challenges.

"QCOW2 over/on LV(M)" - to format a LV / block device with QCOW2 should not be a big issue - given that the "raw" format is already used with LVM / Proxmox Storage plugins - this would then provide qcow internal snapshots on an (initially) thick provisioned LV
While its possible to place QCOW on raw disk (essentially format the disk as QCOW), I believe you will loose snapshot functionality if you do that.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
sample:

- the LV is /dev/vgDATA/vDisk1 (~ 10G)
- vDisk1 is offline / may not be used by a VM/another process
- all commands as root
- cmd outputs are omitted for better readability
- qemu-img/nbd cmds are used here - but it should also work with qemu (and proxmox?) monitor/APIs (online snapshots) - will verify it later


# format LV as qcow2 .... vsize 9G
qemu-img create -f qcow2 /dev/vgDATA/vDisk1 9G

# get detail infos about the image
qemu-img info /dev/vgDATA/vDisk1

# enable nbd
modprobe nbd max_part=8

# connect the qcow2 image as nbd device
qemu-nbd --connect=/dev/nbd0 /dev/vgDATA/vDisk1

# format the nbd dev with ext4
mkfs.ext4 /dev/nbd0

# mount the device under /mnt/disk and create a testfile on it
mount /dev/nbd0 /mnt/disk
touch /mnt/disk/testfile

# unmount it; disconnect nbd device;
umount /mnt/disk; qemu-nbd --disconnect /dev/nbd0

# take an internal snapshot and list it afterwards
qemu-img snapshot -c testsnap /dev/vgDATA/vDisk1
qemu-img snapshot -l /dev/vgDATA/vDisk1 # or qemu-img info /dev/vgDATA/vDisk1

# connect again to the image, mount it, remove the testfile; umount,disconnect again
qemu-nbd --connect=/dev/nbd0 /dev/vgDATA/vDisk1; mount /dev/nbd0 /mnt/disk; rm /mnt/disk/testfile; umount /mnt/disk; qemu-nbd --disconnect /dev/nbd0

# restore/apply testsnap
qemu-img snapshot -a testsnap /dev/vgDATA/vDisk1
 
  • Like
Reactions: bbgeek17
Wasn't such a setup discussed previously and said that this is implemented in libvirt? I cannot find the thread anymore, yet the idea was, that you have a growing backend-thick-LVM and a frontend qcow2 on-top of the logical volume that is already thin-provisioned. I have no idea how often you would check if there is still room, yet it would solve the snapshot "problem".

With such a setup you would of course not have the same level of trimming as you would have with e.g. ZFS, yet you cannot have everything (except with a HA-ZFS storage appliance).
 
With such a setup you would of course not have the same level of trimming as you would have with e.g. ZFS, yet you cannot have everything (except with a HA-ZFS storage appliance).
correct, no trimming - but the other stuff: snapshots and thin provisioning (some kind of)
HA-ZFS storage appliance: if this is already your main SAN - yes , but as additional head node .... especially as ZFS server VM .... if you need to reboot the PVE node where it resides -> all guest storage (=VMs) are down - with or without HA
 
would be helpful to get a concrete technical explanation why such a setup (qcow on block dev) shouldn't be used or what are the caveats
thanks
 
Last edited:
would be helpful to get a concrete technical explanation why such a setup (qcow on block dev) shouldn't be used or what are the caveats
the first problem that comes to mind is how to find out that you need to increase the disk. If you detect it too late, your qcow2 is bricked.
 
but as additional head node .... especially as ZFS server VM .... if you need to reboot the PVE node where it resides -> all guest storage (=VMs) are down - with or without HA
In case of failure, yes. Live migration however is no problem. You can try to setup an HA-ZFS setup, running on different nodes. I haven't tried yet, it's still on my TODO list.

I've already mentioned numerous times, not yet here but on other threads, we use it like this:
FC-based SANs, ordinary thick LVM and we have also multiple ZFS-over-iSCSI servers running, also as VMs and do a live migration from FC-based LVM to the ZFS, do snapshots and all work we needed the snapshot for and afterwards just live migrate the VM back to the LVM. Of course, it's not ideal, yet it works for us and we do not need it very often. With the advent of PBS, we do multiple backups per day and can now just restore if something went wrong without having to create snapshots before (on some machines).
 
  • Like
Reactions: alma21

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!