Error to migrate VMs or CTs stored in Ceph when one of the nodes doesn't use ZFS

dienteperro · Nov 15, 2019

My config is:

3 nodes configured to hyperconverged (cluster+ceph+ha), two netwoks (one for services and one for ceph). All ISOs, templates and disks (for VMs and CTs) stored in Ceph. One of the three nodes was not installed with Zfs but with Ext4.

The issue:

After you migrate a VM or CT to the node with ext4, or create in this very same node with ext4 a VM or CT, is impossible to migrate it out again. The node with ext4 claims it doesn't have the rpool storage. Why is this an issue if VMs and CTs created are in Ceph storage?

The migration finalizes with the following error:

2019-11-15 12:54:09 starting migration of VM 101 to node 'node9' (10.0.0.9)
zfs error: cannot open 'rpool': no such pool

zfs error: cannot open 'rpool': no such pool

2019-11-15 12:54:09 ERROR: Failed to sync data - could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
2019-11-15 12:54:09 aborting phase 1 - cleanup resources
2019-11-15 12:54:09 ERROR: migration aborted (duration 00:00:00): Failed to sync data - could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
TASK ERROR: migration aborted

Expected result:
The node without zfs should not claim a resource it doesn't have. The nodes with zfs should not claim for a resource 'apparently' don't needed for a migration. The migration should take place without any issue.

wolfgang · Nov 18, 2019

Hi,

you have to configure this in the Storage configuration.
A migration with different storage can be done on the Commandline.
see

Code:

qm help migrate

Sarlis Dimitris · Nov 28, 2019

Can you please provide an extra tip for the migration in different storage?
I also keep getting the error due to filesystem into 3rd node:

Code:

()
zfs error: cannot open 'rpool': no such pool

zfs error: cannot open 'rpool': no such pool

TASK ERROR: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool

so i go into node3 shell and type:

qm migrate 103 oberon2 -force

??

thanks

Alwin · Nov 29, 2019

Sarlis Dimitris said:
Can you please provide an extra tip for the migration in different storage?

You need to limit the zfs storage to the nodes that have the resource.
https://pve.proxmox.com/pve-docs/chapter-pvesm.html#_common_storage_properties

Sarlis Dimitris · Nov 29, 2019

Confirm. I missed the local-zfs so for this 3rd node, I re-installed with mirror raid and now local-zfs to avoid same in future.
It seems to work.

thanks

dienteperro · Dec 2, 2019

In my case I reinstall the node with ZFS. Neither the command line or web interface solved the migration issue. After a few days and retries reinstalled the node wth zfs instead of ext4.

ikogan · Dec 23, 2019

I'm having a similar issue. I have a single node with a ZFS RAIDZ that is used only by that node for a handfull of VMs and CTs; it's not marked as shared, and only set on that node.. I have 3 additional nodes with all 4 sharing a Ceph RBD pool. I created a VM on this NAS Node during an upgrade with a disk on clustered Ceph storage that is shared across all nodes. That VM is now stuck to that node because it claims the "local-zfs" storage is not available on the target node. It's correct, that target node does not have "local-zfs", but this VM doesn't use that storage:

Code:

Virtual Environment 6.1-5
Search
Virtual Machine 110 (IPA-1) on node 'nasnode'
Server View
Logs
task started by HA resource agent
2019-12-23 17:03:37 use dedicated network address for sending migration traffic (10.1.1.3)
2019-12-23 17:03:37 starting migration of VM 110 to node 'computenode2' (10.1.1.3)
2019-12-23 17:03:38 ERROR: Failed to sync data - storage 'local-zfs' is not available on node 'computenode2'
2019-12-23 17:03:38 aborting phase 1 - cleanup resources
2019-12-23 17:03:38 ERROR: migration aborted (duration 00:00:01): Failed to sync data - storage 'local-zfs' is not available on node 'computenode2'
TASK ERROR: migration aborted

It's possible that I actually created the disk originally on local-zfs and then moved the disk to Ceph, I can't recall. System Info (I've omitted some stuff for brevity)

Code:

Virtual Environment 6.1-5

==== general system info ====

# pveversion --verbose
...
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
ceph: 14.2.5-pve1
...
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
...
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
...
zfsutils-linux: 0.8.2-pve2

==== info about storage ====

# cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content backup,iso,snippets,vztmpl
    maxfiles 5
    shared 0

lvmthin: local-lvm
    thinpool data
    vgname pve
    content rootdir,images

rbd: cluster
    content images,rootdir
    krbd 1
    monhost 10.1.1.1;10.1.1.2;10.1.1.3
    pool rbd
    username admin

zfspool: local-zfs-disk
    pool Data/Virtualization/Disk
    content rootdir,images
    nodes nasnode
    sparse 1

nfs: nas
    export /data/Virtualization/Data
    path /mnt/pve/nas
    server storage.mydomain.private
    content snippets,backup,iso,vztmpl,rootdir,images
    maxfiles 5
    options vers=3

dir: local-zfs
    path /Data/Virtualization/Data
    content iso,snippets,vztmpl
    maxfiles 5
    nodes nasnode
    shared 0


# pvesm status
Name                  Type     Status           Total            Used       Available        %
cluster                rbd     active       616636353       148423841       468212512   24.07%
local                  dir     active        30832636        10217624        19025772   33.14%
local-lvm          lvmthin     active       214319104        56515947       157803156   26.37%
local-zfs              dir     active      7954180608      1155279872      6798900736   14.52%
local-zfs-disk     zfspool     active      6947762445       148861653      6798900792    2.14%
nas                    nfs     active      7954181120      1155279872      6798901248   14.52%

==== info about virtual guests ====

# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID     
       100 Kubernetes-NASNode   running    4096              10.00 1426645 
       109 Zabbix               running    8192              50.00 3353257 
       110 IPA-1                stopped    6144              15.00 0       
       116 Docker-NASNode       running    4096              20.00 104715 
       122 Test-Kube            stopped    4096              30.00 0       

# cat /etc/pve/qemu-server/110.conf
agent: 1
balloon: 2048
bootdisk: scsi0
cores: 2
cpu: host,flags=+pcid;+spec-ctrl
hotplug: disk,network,usb,memory
ide2: none,media=cdrom
memory: 6144
name: IPA-1
net0: virtio=6E:1B:98:D3:A2:8E,bridge=vmbr0
net1: virtio=96:F1:D1:65:23:3B,bridge=vmbr1
numa: 1
onboot: 1
ostype: l26
parent: PrePatch
scsi0: cluster:vm-110-disk-0,discard=on,size=15G,ssd=1
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=07daab8b-bbcb-434c-952c-a7bf9b60ca8e
sockets: 1
vmgenid: 2e46e650-7b52-440e-8a1d-b60e4896db42

[PrePatch]
agent: 1
balloon: 2048
bootdisk: scsi0
cores: 2
cpu: host,flags=+pcid;+spec-ctrl
hotplug: disk,network,usb,memory
ide2: none,media=cdrom
machine: q35
memory: 6144
name: IPA-1
net0: virtio=6E:1B:98:D3:A2:8E,bridge=vmbr0
net1: virtio=96:F1:D1:65:23:3B,bridge=vmbr1
numa: 1
onboot: 1
ostype: l26
protection: 1
scsi0: cluster:vm-110-disk-0,discard=on,size=15G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=07daab8b-bbcb-434c-952c-a7bf9b60ca8e
snaptime: 1577136565
sockets: 1
vmgenid: 2e46e650-7b52-440e-8a1d-b60e4896db42

==== info about cluster ====

# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 nasnode (local)
         2          1 computenode1
         3          1 computenode2
         4          1 gpunode

# pvecm status
Cluster information
-------------------
Name:             Cluster
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Dec 23 17:05:02 2019
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1.b8
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.1.1.1 (local)
0x00000002          1 10.1.1.3
0x00000003          1 10.1.1.2
0x00000004          1 10.1.1.32

==== info about volumes ====

# lvs
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- 204.39g             26.37  23.13                         
  root          pve -wi-ao----  30.00g                                                 
  swap          pve -wi-ao----   3.62g                                                 
  vm-100-disk-0 pve Vwi-aotz--  10.00g data        26.09                               
  vm-100-disk-1 pve Vwi-aotz--   4.00m data        100.00                               
  vm-116-disk-1 pve Vwi-aotz--  20.00g data        96.74                               
  vm-116-disk-2 pve Vwi-aotz-- 100.00g data        31.94                               

# zfs list
NAME                                         USED  AVAIL     REFER  MOUNTPOINT
Data                                        9.29T  6.33T      305K  /Data
...
Data/Virtualization/Disk                     142G  6.33T      185K  /Data/Virtualization/Disk
Data/Virtualization/Disk/subvol-102-disk-1  3.49G  7.04G      979M  /Data/Virtualization/Disk/subvol-102-disk-1
Data/Virtualization/Disk/vm-109-disk-0       126G  6.33T     39.3G  -
Data/Virtualization/Disk/vm-122-disk-0      12.5G  6.33T     10.9G  -

# ceph status
  cluster:
    id:     2b70b442-d912-4d6b-b9ad-b359b9339e4f
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum 0,1,2,gpunode (age 2w)
    mgr: gpunode(active, since 2w), standbys: computenode1, computenode2, nasnode
    osd: 4 osds: 4 up (since 2w), 4 in

  data:
    pools:   1 pools, 128 pgs
    objects: 40.17k objects, 141 GiB
    usage:   408 GiB used, 1.4 TiB / 1.8 TiB avail
    pgs:     128 active+clean

  io:
    client:   682 B/s rd, 283 KiB/s wr, 0 op/s rd, 50 op/s wr

Edit: I manually copied `110.conf` from `/etc/pve/nodes/nasnode/qemu-server` to `/etc/pve/nodes/computenode2/qemu-server`, started the VM and it came up fine.

Search

Search

Error to migrate VMs or CTs stored in Ceph when one of the nodes doesn't use ZFS

dienteperro

Member

wolfgang

Proxmox Retired Staff

Sarlis Dimitris

Active Member

Alwin

Proxmox Retired Staff

Sarlis Dimitris

Active Member

dienteperro

Member

ikogan

Well-Known Member