Bug - share iSCSI storage with volume chain (snapshots)

TimmiORG

Member
Oct 23, 2023
12
1
8
Hi all,

I'm currently trying to hunt down an issue with VMs and snapshots on our shared iSCSI storage.
Yes, I know it is technology preview, still I think it make sense to report the issue.

For me it looks like the that disks are not getting detached after a snapshot have been created and this causes issue if the VM is getting migrated to a different host within the cluster while powering up after taken the snapshot or rollback.

No VM is running on the system
Code:
lrwxrwxrwx 1 root root       7 Oct 31 13:25 MSA-Storage03 -> ../dm-7

VM is running
Code:
lrwxrwxrwx 1 root root       7 Oct 31 13:25 MSA-Storage03 -> ../dm-7
lrwxrwxrwx 1 root root       7 Oct 31 13:26 MSA--Storage03-snap_vm--299--disk--0_initial--OS.qcow2 -> ../dm-8
lrwxrwxrwx 1 root root       7 Oct 31 13:26 MSA--Storage03-vm--299--disk--0.qcow2 -> ../dm-9

The mapper to the VM disks are gone again after powering down the VM.

But if you create a snap shot the mappers are not getting removed after the task is completed.
Code:
lrwxrwxrwx 1 root root       7 Oct 31 13:28 MSA--Storage03-snap_vm--299--disk--0_initial--OS.qcow2 -> ../dm-8
lrwxrwxrwx 1 root root       7 Oct 31 13:28 MSA--Storage03-snap_vm--299--disk--0_Test.qcow2 -> ../dm-9
lrwxrwxrwx 1 root root       8 Oct 31 13:28 MSA--Storage03-vm--299--disk--0.qcow2 -> ../dm-10

This is causing issue the VM is balanced during powerup to a different host.
The mappers are gone again If I start/stop the VM on the same host.

So I assume that the LVM mappers should be removed after the snapshot task.

Hope this helps and regards
 
Last edited:
the snapshot volumes need to be active. could you describe which symptoms you are seeing exactly?

the only time volumes are usually deactivated are
- as part of error handling for freshly allocated volumes
- as part of migration to another node

if you are missing some deactivation when migrating, please clearly describe the state before and after migration, and include "pveversion -v" and the VM and storage configuration. thanks!
 
Hi Fabian,

I'm running a cluster with 4 nodes and share iSCSI storage.
The VM disks (qcow2) are not registered with the OS while the VM is off.

The are visible in (e.g. dmsetup) only when the VM is running on the host.
Everthing is working normal during migration or if I poweroff the VM.

But when I take a snap shot the volumes are staying registered with the OS.

This is the output of you requested:
Code:
proxmox-ve: 9.0.0 (running kernel: 6.14.11-4-pve)
pve-manager: 9.0.11 (running version: 9.0.11/3bf5476b8a4699e2)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.11-3-pve-signed: 6.14.11-3
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14.11-1-pve-signed: 6.14.11-1
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.16-1
proxmox-backup-file-restore: 4.0.16-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.0
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.2
proxmox-widget-toolkit: 5.0.6
pve-cluster: 9.0.6
pve-container: 6.0.13
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-2
pve-ha-manager: 5.0.5
pve-i18n: 3.6.1
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.23
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
 
Hi,
But when I take a snap shot the volumes are staying registered with the OS.
as @fabian already said, the snapshot volumes need to be active. This is because the current volume uses its parent snapshot as a so-called backing image, and the backing image itself uses its parent snapshot as a backing image and so forth. Each volume in the chain only records the delta to the previous one.
 
  • Like
Reactions: Johannes S
Hi Fiona,

thank you for getting back to me. I'm not questioning that the snap shot is active. This is a LV chain so it has to be active if the VM is running!
The LVs are getting correctly deactivated if the VM is shutdown.

The situation which looks not correct to me is that the LVs stay active after you created a snapshot or rolled one back.
This is not a problem as long as the VM is started on the same node. But if HA is moving the VM it does not know that there are active LVs on the former node.

I will try to explain one more time.

VM stopped:
- no LV active for the VM on the node

VM running:
- LV active on the node

VM stopped and new snapshot created:
- LV become active and stay active

The last step is creating the problem only if the VM is getting migrated to a different node (due to resources allocation).
If the VM is starting on the same node and stopped the LVs will be correctly deactivated.

Best regards
 
VM stopped and new snapshot created:
- LV become active and stay active

The last step is creating the problem only if the VM is getting migrated to a different node (due to resources allocation).
I tried reproducing this here: for me during rebalance-on-start, a migration task is created and there, shared volumes are deactivated on the source node at the end:
https://git.proxmox.com/?p=qemu-ser...6d29d4f34f139cc88c0f6306ba5104b;hb=HEAD#l1817
https://git.proxmox.com/?p=qemu-ser...58b13d3bc85bc08b7681f067278a65f;hb=HEAD#l6028

Could you share the full migration task log and/or system logs from around the time of the issue?
 
Hi Fiona,

I have removed the HA from the VMs for the moment as we run into a couple of unrecoverable situations and missing LV from the chain.
I will check that I will create me a test VM to reproduce the issue.
Please give me a couple of days to do that.
 
Hi Fiona,

OK looks like I was on the wrong track. You are right that this is correctly cleaned up during the migration.
I will put this VM into our automated deployment process and will check if I'm able to reproduce the issue again.

Keep you posted
 
  • Like
Reactions: fiona
Hi Fiona,

I was trying to replicate the issue with a basic shell script which is creating task for one of my test VMs.
But this was not successful. ;(

Still the automated rollback calls from our Jenkins created the problem.
The best guess I currently have is that there needs to be some time between the finish of the "Shutdown" and the "Rollback".

The Rollback API call directly after the Shutdown finished is causing the VM to be in locked state. At least one time the LV chain was corrupted.
We will put a sleep between the Shutdown and Rollback task for now and check if this produce any difference.

Code:
Nov 10 17:37:49 lxmilgram.example.com pvedaemon[1580981]: <jenkins@example!buildservers> end task UPID:lxmilgram:001BC0B7:05490D44:691214D3:qmshutdown:305:jenkins@example!buildservers: OK
Nov 10 17:37:49 lxmilgram.example.com pvedaemon[1705138]: <jenkins@example!buildservers> starting task UPID:lxmilgram:001BC3DB:05491122:691214DD:qmrollback:305:jenkins@example!buildservers:
Nov 10 17:37:49 lxmilgram.example.com pvedaemon[1819611]: <jenkins@example!buildservers> rollback snapshot VM 305: initial
Nov 10 17:37:49 lxmilgram.example.com qmeventd[1819593]: Starting cleanup for 305
Nov 10 17:37:50 lxmilgram.example.com qmeventd[1819593]: Finished cleanup for 305
Nov 10 17:37:50 lxmilgram.example.com pvedaemon[1819611]: qemu-img: Could not open '/dev/MSA-Storage01/vm-305-disk-0.qcow2': Could not open '/dev/MSA-Storage01/vm-305-disk-0.qcow2': No such file or directory
Nov 10 17:37:50 lxmilgram.example.com pvedaemon[1705138]: <jenkins@example!buildservers> end task UPID:lxmilgram:001BC3DB:05491122:691214DD:qmrollback:305:jenkins@example!buildservers: qemu-img: Could not open '/dev/MSA-Storage01/vm-305-disk-0.qcow2': Could not open '/dev/MSA-Storage01/vm-305-disk-0.qcow2': No such file or directory
Nov 10 17:54:09 lxmilgram.example.com pvedaemon[1705138]: <jenkins@example!buildservers> starting task UPID:lxmilgram:001CFFE6:054A8FC8:691218B1:qmrollback:305:jenkins@example!buildservers:
Nov 10 17:54:09 lxmilgram.example.com pvedaemon[1900518]: <jenkins@example!buildservers> rollback snapshot VM 305: initial
Nov 10 17:54:10 lxmilgram.example.com pvedaemon[1705138]: <jenkins@example!buildservers> end task UPID:lxmilgram:001CFFE6:054A8FC8:691218B1:qmrollback:305:jenkins@example!buildservers: VM is locked (rollback)

Keep you posted.
 
Hi all,

another quick update.
So we have implemented a 10 seconds sleep between the shutdown and the rollback and so far we did not run into any issues.
I have also enabled HA on one of the VMs for testing. Next week we are planing to activate more VMs to HA. But so far it looks like that the sleep was helping.
 
Do you have saferemove active on the storage? I think I'm able to reproduce it now and it's a race between rollback and shutdown or more precisely, the cleanup handling after shutdown. Rollback temporarily drops the file lock for modifying the configuration and if cleanup does a badly timed deactivate for the volume, the issue can happen.

Another workaround should be to drop the shutdown. You don't need to do that before rollback, rollback will already take care of it.
 
  • Like
Reactions: TimmiORG
Hi Fiona,

thank you for your reply.

"saferemove" is not enable on the storage.
Today I have enabled HA on three VMs to see if it is still working with the sleep as these are getting restored together.

But very happy that you could reproduce the issue. Just to make it clear typically we run into the locked VM behaviour but sometimes we also lost the main disk in the chain.

I hope you will be able to fix the race condition to that this is getting a bit more bulletproof.

Best regards
Timmi