[SOLVED] Can't live migrate with local storage, can't restrict to one node

proxmoxuser12849242

New Member
Jul 18, 2022
21
2
3
Hi,

I have a VM that is on a Ceph storage, but for some reason the local storage is preventing it from migrating to another node, eventhough the data is not on the local volume. I heard that you can restrict the local storage to just that one node (which it is always, as it is LOCAL STORAGE)
It is a local ZFS volume called pxmx_raid1_pool, how can I restrict it to just that one node? This is the system boot volume, so I can't recreate it for obvious reasons.
 
Hi,
I have a VM that is on a Ceph storage, but for some reason the local storage is preventing it from migrating to another node, eventhough the data is not on the local volume. I heard that you can restrict the local storage to just that one node (which it is always, as it is LOCAL STORAGE)
It is a local ZFS volume called pxmx_raid1_pool, how can I restrict it to just that one node? This is the system boot volume, so I can't recreate it for obvious reasons.
Please post the VM and storage config:
Bash:
qm config VMID
cat /etc/pve/storage.cfg
 
Hi,

Please post the VM and storage config:
Bash:
qm config VMID
cat /etc/pve/storage.cfg
cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,backup,iso

zfspool: local-zfs
pool rpool/data
content rootdir,images
sparse 1

zfspool: data-zfs
pool pxmx_raidz1_pool
content images,rootdir
mountpoint /pxmx_raidz1_pool
sparse 0

nfs: gtBackup
export /media/nfsShares/[sanitized servername]
path /mnt/pve/gtBackup
server [sanitized IP]
content backup
prune-backups keep-monthly=3,keep-weekly=6

rbd: ceph_rbd
content images
monhost [sanitized IP]
pool fs_pool
username admin

qm config 109
agent: 1
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 16000
meta: creation-qemu=6.1.0,ctime=1645744568
name: tailscaletest
net0: virtio=A6:B9:2E:CA:E4:FF,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: ceph_rbd:vm-109-disk-0,size=64G
scsihw: virtio-scsi-pci
smbios1: uuid=dcdaaa79-0fae-4ee4-aaa3-8f3bb069bfd3
sockets: 1
vmgenid: bbbba75e-8dfd-472c-ab13-3d9aaa35d632
 
2022-07-20 10:39:18 starting migration of VM 109 to node 'xyz'
zfs error: cannot open 'pxmx_raidz1_pool': no such pool

zfs error: cannot open 'pxmx_raidz1_pool': no such pool

2022-07-20 10:39:18 ERROR: Problem found while scanning volumes - could not activate storage 'data-zfs', zfs error: cannot import 'pxmx_raidz1_pool': no such pool available
2022-07-20 10:39:18 aborting phase 1 - cleanup resources
2022-07-20 10:39:18 ERROR: migration aborted (duration 00:00:01): Problem found while scanning volumes - could not activate storage 'data-zfs', zfs error: cannot import 'pxmx_raidz1_pool': no such pool available
TASK ERROR: migration aborted
 
Hmm, ack, now it's clear.

The storage.cfg is clusterwide, so you actually need to limit those storage entries that are not available on all cluster nodes to the set of nodes on which they are available. Migration will scan all available storages, not just the ones from the VM config itself, to ensure there are no unindexed disk volumes that are owned by that VM on any storage, as that can cause trouble in the long run.

In your case it seems that (at least) data-zfs is not available on the source node, so you should edit that storage entry (Datacenter -> Storage) and use the node selector there to tell PVE on what nodes this storage is actually available.
 
Last edited:
That sounds confusing, why would that ever be a problem? So the cluster does not know where storages are? On which server they are attached?
How do I restrict the storage to just one node or server? I can find references everywhere to do that, but there is no documentation HOW TO do that exactly. As it looks right now, this is not possible in the GUI.
 
Maybe you can clarify something for me: Why would a local storage ever be a problem for migration? Local storage is always attached to just one server, others in the cluster have no access to it. Migration only works when the VM is not on local storage, which also makes sense. Why would I ever have to restrict local storage to only the server it is attached to?
 
Hi,
That sounds confusing, why would that ever be a problem? So the cluster does not know where storages are? On which server they are attached?
no, that's what the storage configuration is for. Otherwise you can't distinguish a storage that's not reachable, because of some issue and a storage that's not there at all.
How do I restrict the storage to just one node or server? I can find references everywhere to do that, but there is no documentation HOW TO do that exactly. As it looks right now, this is not possible in the GUI.
You can edit the storage in the GUI Datacenter > Storage > Edit and set the node restriction there. With pvesm you can use the --nodes option.

Maybe you can clarify something for me: Why would a local storage ever be a problem for migration?
@t.lamprecht already explained this. Migration currently checks all storages to make sure no (orphaned) drive of the VM is left behind. We know this behavior is rather unexpected and can lead to such issues and it might well change in a future major release.
Local storage is always attached to just one server, others in the cluster have no access to it.
But if you configured the storage for all nodes, each server thinks it has it's own instance of that storage (not the same of course, but with the same configuration).
Migration only works when the VM is not on local storage, which also makes sense.
No, you can migrate with local storage. It just needs to copy the disks first and potentially takes a while.
Why would I ever have to restrict local storage to only the server it is attached to?
Because otherwise your configuration might not reflect reality.
 
Last edited:
Thank you for the explanation, it is becoming a little clearer.
When I go to datacenter -> storage, I don't see pxmx_raidz1_pool there at all.
The only place I see this ZFS pool is on one of the nodes, which makes sense: This is just a local ZFS pool, attached to just that one node.
But here I can't change to which node it is available.

1659549780217.png
 
Thank you for the explanation, it is becoming a little clearer.
When I go to datacenter -> storage, I don't see pxmx_raidz1_pool there at all.
The only place I see this ZFS pool is on one of the nodes, which makes sense: This is just a local ZFS pool, attached to just that one node.
But here I can't change to which node it is available.

View attachment 39649
This is on the node level. You need to select Datacenter at the very left and then in the middle side-panel Storage.
 
Edit: It was set to just one node already, yet it did not work. I set it to all (unrestricted) and then back to just that one node and then it worked, so toggling it OFF and ON again fixed this strange issue.

Thank you for your help in resolving this issue!
 
Last edited:
  • Like
Reactions: Neobin
Hmm, ack, now it's clear.

The storage.cfg is clusterwide, so you actually need to limit those storage entries that are not available on all cluster nodes to the set of nodes on which they are available. Migration will scan all available storages, not just the ones from the VM config itself, to ensure there are no unindexed disk volumes that are owned by that VM on any storage, as that can cause trouble in the long run.

In your case it seems that (at least) data-zfs is not available on the source node, so you should edit that storage entry (Datacenter -> Storage) and use the node selector there to tell PVE on what nodes this storage is actually available.
Hi t.lamprecht,

I was having the same issue however I have vm's running in that local storage. Would disabling the local storage cause the running vm's to break?
In my case, I am able to migrate from the existing pve server to the new one, however, I am running into the same issue as mentioned in the beginning of this thread. Also, another difference is that I am using a network storage (CIFS) to save the vm disks.

Yet its prevents me to migrate back to my existing pve server.
 
Would disabling the local storage cause the running vm's to break?
It would not break the running VM, but it would not be able to start again, and many other PVE related actions for that VM might fail.

The question is, why disable completely? Why can't you change the storage so that the node restrictions reflect accurately on which nodes a storage is actually available?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!