vm offline migration from cluster to cluster using Netapp Storage

Sep 16, 2025
3
0
1
Hello,

we are planning to perform an offline migration of some hundreds of vm´s from three Proxmox clusters (which are going to be retired afterwards) to two Proxmox clusters we are keeping. And there are difficulties shown up while testing this procedure with a test vm. Would like to have your opinion on the steps needed.

Our environment:
  • Proxmox pve 8.4.12
  • Each cluster uses storage LUN´s from one Netapp Storage (with individual LUN´s for each cluster)
  • Connectivity is done by FC (two paths with mutlipath)
  • Storage is provided to PVE by lvm logical disks
  • cluster1, cluster2 and cluster3 are to be retired
  • cluster4 and cluster5 will stay and should hold the vm´s from the other clusters

We´d like to migrate all vm´s from one Cluster ("cluster1") to another ("cluster5") - and that´s the plan:
  • Shutdown all vm´s from cluster1
  • Copy the vm configuration files from /etc/pve/nodes/node[1-4]/qemu-server to cluster5
  • Remove the storage from Proxmox configuration ("pvesm remove storage1_cluster1")
  • Take offline the LUN "storage1_cluster1" on the Netapp Storage
  • Remove the mapping for the LUN "storage1_cluster1" from Cluster1
  • Create a new mapping for the LUN "storage1_cluster1" for Cluster5
  • Add the storage "storage1_cluster1" to cluster5
  • Modify vmid´s if needed (to avoid duplicates)
  • Start all vm´s on cluster5
First tests show that after taking the LUN offline PVE seems to be somehow irritated on the missing storage (while vm´s from other storage on the cluster were fine). It looks like some other task is needed to "release" the storage from the cluster (without deleting the files).

Is there any migration expert with some know how on central storage by Netapp out there who can assist in this issue?

Thanks in advance for your assistance.

br, Gregor
 
Hi @Budgreg , welcome to the forum.

First tests show that after taking the LUN offline PVE seems to be somehow irritated on the missing storage
Is this how you would describe a system in a ticket you open with Netapp? :-) What does it mean for software to be irritated? :-)

  • Remove the storage from Proxmox configuration ("pvesm remove storage1_cluster1")
This just removes a pool definition from PVE, the OS/Kernel are still very aware of the LVM structure and LUN presence. If you use Multipath - it also does not just forget about the device that disappeared.

Keep in mind that PVE is based on Debian with Ubuntu derived Kernel. Treat it as a Linux system - if you had a Linux system with Multipath/LVM/SAN you would not just yank the LUN as part of scheduled maintenance from live system. At a high level: destroy/remove Multipath and remove LVM.

Is there any migration expert with some know how on central storage by Netapp out there who can assist in this issue?
I don't think your questions/issues are Netapp specific. That said, the scale of your environment suggests that you may be in a position of engaging a Proxmox Partner https://www.proxmox.com/en/partners/find-partner/explore


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S
Is this how you would describe a system in a ticket you open with Netapp? :-) What does it mean for software to be irritated? :-)
Hi,

sorry for being so unprecise (on my first post :rolleyes:). The PVE gui showed up with questionmarks on each vm (even the ones with disks from other storage than the one that went offline). And lvdisplay/vgdisplay did not display any result, too.

Anyway thank you for your support. I agree that we have to dig into the multipath/lvm configuration to fix this behavior.

best regards, Gregor
 
The PVE gui showed up with questionmarks on each vm (even the ones with disks from other storage than the one that went offline)
this is due to the stats collector being in a hung state. make sure there are no vm's still referencing the missing datastore; if the question marks are still there:
1. check pvesm status. there should be no unknown datastores
2. systemctl restart pvestatd
3. systemctl restart pveproxy (may not be needed)

Otherwise, your procedure should work, EXCEPT that modifying vmid's in the config would likely be insufficient since the logical volumes will be misnamed.
 
  • Like
Reactions: Johannes S
this is due to the stats collector being in a hung state. make sure there are no vm's still referencing the missing datastore
Not only that, but if there are dead DM devices, the lvs, pvs and other scan commands used by PVE will hang. This in turn will cause stats daemon to hang.
Having dead devices on the system will lead to unpredictable instability.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
this is due to the stats collector being in a hung state. make sure there are no vm's still referencing the missing datastore
Hi,

thanks for your reply. No, there was a single vm (shut down) using this specific storage. And although this cluster handles the least critical systems of all I´d like to reduce the impact of the migration tests to a minimum so I quickly put the LUN back online again which solved the issue for now.

Will work on stabilizing the situation after setting the LUN to offline (and removing the mapping for the cluster). I appreciate any further information on how to remove the storage - thanks.

Otherwise, your procedure should work, EXCEPT that modifying vmid's in the config would likely be insufficient since the logical volumes will be misnamed.
Got it. Keeping the original names of the logical volumes could lead to duplicate volumes so the logical volume names have to be modified, too.