Shared LVM clone support

smelikov

Member
Jul 26, 2022
16
1
8
Hello,
We`ve just created new cluster with SCSI storage with multipath and proxmox cluster with shared LVM.
We understood that it doesn`t support snapshots, but after a while we see that official document says that it also doesn`t support clones.

From https://pve.proxmox.com/wiki/Storage:_LVM
LVM is a typical block storage, but this backend does not support snapshots and clones.

We have our template VMs and actively using clones and those actually work, but time to time we face issues that dmsetup - device mapper keeps reference to unexisting lvs.
So are clones supported on shared LVM?
If not what can be the way around it to enable using clones?
 
There are two types of clones - Linked and Full. A Linked clone is only possible with the assistance of snapshot capable storage. A Full clone is simply a byte-for-byte copy, using qemu-img or dd to make a full copy of the original disk. A Full clone is possible from any storage to any storage.

https://pve.proxmox.com/wiki/VM_Templates_and_Clones


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Thank you, we are using full clones. However, sometimes we are facing the issue that VM is failing to start. Colleague will be posting details of the error in a few.
 
Unfortunately this is a relatively generic error that can have many causes. It could be an actual corruption or bios incompatibility. As I am sure you have seen - many people ran into this over the last ten years without a single common solution.
Off the top of my head, it could be caused by clone operation not considering cached data, virtual BIOS issues, actual disk corruption and a few more things.
You will need to methodically find a way to reliably reproduce this, find byte level differences between working and non-working system (if clones are involved), figure out why those differences exist.

Best of luck.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
These are all clones, what I noticed is this is happening much more often on the reused VMid, probably due to lvs naming. I also noticed is this is happening with Windows and Ubuntu VMs only, it always complains about the same file.
Could you elaborate a bit about clone not considering cached data?
 
These are all clones, what I noticed is this is happening much more often on the reused VMid, probably due to lvs naming.
The full cycle of your workflow is still escaping me. Fully documenting your steps will be helpful should you decide to open an official bug https://bugzilla.proxmox.com or open a case with Support team.

Could you elaborate a bit about clone not considering cached data?
If you are cloning a running/live VM and there is no coordination between Hypervisor and VM (via Qemu Guest Agent for example), the resulting copy is similar to a hard power off reboot of a VM/PC. As long as OS/VM can recover from hard power down, it should be able to recover from a clone operation.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I have template VM, so it`s stopped:
1695934217734.png


And I`m running full clone on this to the same storage. The pattern is as long as it is Windows VM very often it looks like the disk is corrupted and this is if I clone to the same shared LVM storage.
And looks like the key is about reusing VMid, so lvm disk names are also reused, trying to find any relation.
 
Last edited:
Unfortunately, based on the limited information you provided, I dont have a solution for you. Nor is the forum the best medium for troubleshooting something that very well could be a problem in your template.
My suggestion is to work on creating a fully reproducible scenario that can be taken by QA team and fed to development. If this is a commercial/production problem - I recommend purchasing a subscription so that you can have some sort of SLA.

Good luck.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The scenario is easily reproducible:
1 - Create windows VM assigned new VMid - VM starts just fine
2 - Delete VM
3 - Create VM again so it gets same VMid from p.1 and getting this error above.

And also I have another storage attached, which is NFS and when cloning to this - everything works just fine. It also proves that template is good and the problem is most probably with LVM recreating LVs with the same name.

If I can provide some outputs that will help to guide - please let me know.
 
Last edited:
You said that this happens with Windows and Ubuntu VMs. As Windows is much more finicky than Ubuntu, do you have a repro scenario for Ubuntu?
Can you repro with any type of storage, ie using qcow or local-lvm as backing storage?
Is it 100% reproducible or random?

The bottom line is that you are the only person reporting such issue. Your setup is not unusual, but also not a common one. You have quite a few moving pieces: scsi, multipath, lvm, cluster, templates, windows, ubuntu. A failure could be introduced by one part or a combination of them.

If it was my environment, I'd be doing byte-level disk compares between a known good clone and known bad one. If disks are identical, as they should be, then noting the differences in VM config and whether minor changes there can fix the problem.

good luck with it


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!