last missing enterprise feature (on shared block storage / SAN)

alma21 · May 4, 2024

Hi,

I know that there is already "LVM over iSCSI/FC" for shared block storage (SAN) which carves out LV from a shared VG/LUN for individual VM disks - but with no snapshot and thin provisioning support

my question is why is there no (extended) storage plugin which provides

1.) "QCOW2 over/on LV(M)" - to format a LV / block device with QCOW2 should not be a big issue - given that the "raw" format is already used with LVM / Proxmox Storage plugins - this would then provide qcow internal snapshots on an (initially) thick provisioned LV

2.) the thin provisioning is maybe the harder part - but oVirt/RHV did this already with some kind of watchdog - if an initial small (e.g. 1GB) thick QCOW LV gets full -> extend it with +x GB

bbgeek17 · May 4, 2024

alma21 said:
my question is why is there no (extended) storage plugin which provides

The short answer is - nobody volunteered their time to write and submit the code. If PVE developers looked at it, then given it was not delivered yet, it was not a high enough priority. Or there were technical challenges.

alma21 said:
"QCOW2 over/on LV(M)" - to format a LV / block device with QCOW2 should not be a big issue - given that the "raw" format is already used with LVM / Proxmox Storage plugins - this would then provide qcow internal snapshots on an (initially) thick provisioned LV

While its possible to place QCOW on raw disk (essentially format the disk as QCOW), I believe you will loose snapshot functionality if you do that.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

alma21 · May 5, 2024

sample:

- the LV is /dev/vgDATA/vDisk1 (~ 10G)
- vDisk1 is offline / may not be used by a VM/another process
- all commands as root
- cmd outputs are omitted for better readability
- qemu-img/nbd cmds are used here - but it should also work with qemu (and proxmox?) monitor/APIs (online snapshots) - will verify it later

# format LV as qcow2 .... vsize 9G
qemu-img create -f qcow2 /dev/vgDATA/vDisk1 9G

# get detail infos about the image
qemu-img info /dev/vgDATA/vDisk1

# enable nbd
modprobe nbd max_part=8

# connect the qcow2 image as nbd device
qemu-nbd --connect=/dev/nbd0 /dev/vgDATA/vDisk1

# format the nbd dev with ext4
mkfs.ext4 /dev/nbd0

# mount the device under /mnt/disk and create a testfile on it
mount /dev/nbd0 /mnt/disk
touch /mnt/disk/testfile

# unmount it; disconnect nbd device;
umount /mnt/disk; qemu-nbd --disconnect /dev/nbd0

# take an internal snapshot and list it afterwards
qemu-img snapshot -c testsnap /dev/vgDATA/vDisk1
qemu-img snapshot -l /dev/vgDATA/vDisk1 # or qemu-img info /dev/vgDATA/vDisk1

# connect again to the image, mount it, remove the testfile; umount,disconnect again
qemu-nbd --connect=/dev/nbd0 /dev/vgDATA/vDisk1; mount /dev/nbd0 /mnt/disk; rm /mnt/disk/testfile; umount /mnt/disk; qemu-nbd --disconnect /dev/nbd0

# restore/apply testsnap
qemu-img snapshot -a testsnap /dev/vgDATA/vDisk1

LnxBil · May 5, 2024

Wasn't such a setup discussed previously and said that this is implemented in libvirt? I cannot find the thread anymore, yet the idea was, that you have a growing backend-thick-LVM and a frontend qcow2 on-top of the logical volume that is already thin-provisioned. I have no idea how often you would check if there is still room, yet it would solve the snapshot "problem".

With such a setup you would of course not have the same level of trimming as you would have with e.g. ZFS, yet you cannot have everything (except with a HA-ZFS storage appliance).

alma21 · May 6, 2024

LnxBil said:
With such a setup you would of course not have the same level of trimming as you would have with e.g. ZFS, yet you cannot have everything (except with a HA-ZFS storage appliance).

correct, no trimming - but the other stuff: snapshots and thin provisioning (some kind of)
HA-ZFS storage appliance: if this is already your main SAN - yes , but as additional head node .... especially as ZFS server VM .... if you need to reboot the PVE node where it resides -> all guest storage (=VMs) are down - with or without HA

alexskysilk · May 6, 2024

alma21 said:
qemu-img create -f qcow2 /dev/vgDATA/vDisk1 9G

I... dont think you can do this.

alma21 · May 6, 2024

would be helpful to get a concrete technical explanation why such a setup (qcow on block dev) shouldn't be used or what are the caveats
thanks

LnxBil · May 8, 2024

alma21 said:
would be helpful to get a concrete technical explanation why such a setup (qcow on block dev) shouldn't be used or what are the caveats

the first problem that comes to mind is how to find out that you need to increase the disk. If you detect it too late, your qcow2 is bricked.

LnxBil · May 8, 2024

alma21 said:
but as additional head node .... especially as ZFS server VM .... if you need to reboot the PVE node where it resides -> all guest storage (=VMs) are down - with or without HA

In case of failure, yes. Live migration however is no problem. You can try to setup an HA-ZFS setup, running on different nodes. I haven't tried yet, it's still on my TODO list.

I've already mentioned numerous times, not yet here but on other threads, we use it like this:
FC-based SANs, ordinary thick LVM and we have also multiple ZFS-over-iSCSI servers running, also as VMs and do a live migration from FC-based LVM to the ZFS, do snapshots and all work we needed the snapshot for and afterwards just live migrate the VM back to the LVM. Of course, it's not ideal, yet it works for us and we do not need it very often. With the advent of PBS, we do multiple backups per day and can now just restore if something went wrong without having to create snapshots before (on some machines).

Search

Search

last missing enterprise feature (on shared block storage / SAN)

alma21

New Member

bbgeek17

Distinguished Member

alma21

New Member

LnxBil

Distinguished Member

alma21

New Member

alexskysilk

Distinguished Member

alma21

New Member

LnxBil

Distinguished Member

LnxBil

Distinguished Member