unable to online migrate to host with local ZFS

dswartz

Renowned Member
Dec 13, 2010
286
9
83
I have two hosts, one has a zfs raid1 with two NVME drives. They are both connected to a JBOD via NFS. To upgrade the first host, I did this:

move all disks from nvme pool to jbod (shared) pool
migrate all guests from host1 to host2
upgrade and reboot host1.

Unfortunately, if I try to migrate guests from host2 to host1, I get this:

root@pve2:~# qm migrate 101 pve --online
Apr 01 15:05:30 starting migration of VM 101 to node 'pve' (10.0.0.5)
Apr 01 15:05:30 copying disk images
Apr 01 15:05:30 ERROR: Failed to sync data - could not activate storage 'NVME', zfs error: cannot import 'nvme': no such pool available
Apr 01 15:05:30 aborting phase 1 - cleanup resources
Apr 01 15:05:30 ERROR: migration aborted (duration 00:00:00): Failed to sync data - could not activate storage 'NVME', zfs error: cannot import 'nvme': no such pool available
migration aborted

this is mystifying for two reasons. 1. the nvme pool IS active, and 2. the guest is currently running on pve2, which has no access to that pool. the guest config:
root@pve2:/etc/pve/nodes/pve2/qemu-server# cat 101.conf
agent: 1
bootdisk: virtio0
cores: 1
memory: 1024
name: ssh-server
net0: virtio=0A:3C:87:2E:46:93,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=15e0c475-c3c8-40f1-aeb0-764f078b0d27
sockets: 1
virtio0: JBOD:101/vm-101-disk-1.raw,cache=writeback,size=32G

so it has no references to the nvme storage, so why is it doing this? i really don't want to have to shut down all the guests to migrate them :(
 
well, that *highly* sucks. i shut down a guest that is not critical, and i still can't migrate it. so, so far, it looks like all of my guests are stuck on the 2nd node (which is a lower-powered node, so I do not want to leave them there...)
 
I did test one emergency method. On host2, do a backup, then stop and remove the guest. Then go to host1, restore that backup and power on. This cannot be the right way though?
 
Okay, I worked around this with a horrible hack. Since the nvme pool was totally empty, I removed it from the storage view. Thus, I was able to migrate all guests from pve2 => pve. I then re-added the nvme ZFS storage, and began migrating all of the disks back to it. This sounds like a bug?
 
Okay, 100% reproduceable test case here. Node 1 has a local ZFS pool, Node 2 doesn't. Create a VM with no disk storage at all on Host 1. Offline migrate to Host 2. Works fine. Offline migrate back to Host 1. Fails with the error about the NVME pool.
 
well, that *highly* sucks. i shut down a guest that is not critical, and i still can't migrate it. so, so far, it looks like all of my guests are stuck on the 2nd node (which is a lower-powered node, so I do not want to leave them there...)
Hi,
if you move your VM not to the "shared"? NVME you can migrate your VM online with:
Code:
qm migrate 123 node-x --online --with-local-disks
if the storage on both available (not shared - the content will synced).

Udo
 
I don't think I was clear then. There are no disks on local storage. If you look at my repro instructions, you can see you don't even *need* a disk! I created a VM with no disk at all, and it fails this way. I did come up with an ugly workaround - create an 8GB nvme pool on /var/foo file and suddenly migration works again. This has got to be a bug. To add to this: I did try the --with-local-disks and it fails the same way.
 
I don't think I was clear then. There are no disks on local storage. If you look at my repro instructions, you can see you don't even *need* a disk! I created a VM with no disk at all, and it fails this way. I did come up with an ugly workaround - create an 8GB nvme pool on /var/foo file and suddenly migration works again. This has got to be a bug. To add to this: I did try the --with-local-disks and it fails the same way.
Hi,
ok haven't read your first post carefully enough…

Looks like an error, but for this it's good to post your versions, which are running on both nodes.

Udo
 
Hi,

when PVE migrate it checks if all storage are available.

I guess your create the nvme storage and do not restrict it to the pve node.

So when you try to migrate to pve3 the storage conf expected a storage NVME on pve3.
 
The NVME pool is a local device, surely you don't suggest I mark it as shared? If so, what's the point of having local storage, if you have to lie about it being shared? I assume this doesn't happen for the default local and local-lvm storage, as any newly installed node has those. I suppose I can stick a flash drive in a USB slot, and create a dummy pool on it called 'nvme', but that seems like an awful hack, no?
 
The NVME pool is a local device, surely you don't suggest I mark it as shared? If so, what's the point of having local storage, if you have to lie about it being shared? I assume this doesn't happen for the default local and local-lvm storage, as any newly installed node has those. I suppose I can stick a flash drive in a USB slot, and create a dummy pool on it called 'nvme', but that seems like an awful hack, no?

you mix up two totally unrelated things. in PVE, the storage configuration is cluster-wide. when you configure a storage, you have two orthogonal settings that relate to where it is available and how it is treated:
  • the "shared" flag: this tells PVE that this storage has identical content on all nodes where it is available (e.g., because it is an NFS export on a NAS, or some kind of distributed storage like Ceph, or ... - this does not mean that PVE will share it by itself!)
  • the "nodes" property: a list of nodes where the storage is available, empty/undefined = all nodes (this is shown as "All (No restrictions)" on the GUI)
this gives us the following four combinations:
  • shared and available on all nodes (with identical content!)
  • shared and available on some part of the cluster (with identical content!)
  • local and available on all nodes (but with different content on each node!)
  • local and available on some part of the cluster (but with different content on each node!)
your "nvme" storage falls in the last category, but you configured it like the third one (it's only available on one node, but you haven't told PVE ;)). PVE wants to migrate all the disks when migrating a guest, so it looks at all the storages that are available on the source node (according to the storage configuration), and checks for disk images. if a storage is supposed to be there, but it is not, the migration will error out (rather then potentially incompletely migrate a guest).

hope this clears things up (and maybe we should put more detailed information like this in the admin guide?)
 
Bless you, that's the part of the puzzle I was missing :) I just edited the settings for the nvme storage and select 'pve' instead of the default, and now it all works. Yes, it would be nice to have this spelled out a little more clearly. Thanks again!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!