2 Nodes & ProxMox Storage

JC Connell

Member
Apr 6, 2016
29
1
23
37
I added an old laptop as a second node to my existing ProxMox homelab. The goal was to experiment and learn and I wanted to be able to take advantage of live migration. I can't migrate VMs or transfer via backups at all and I'm not sure why. Looking for some help.

The first node already had ProxMox installed with a handful of VMs. I installed ProxMox on the second node, created a cluster on the first and added the second node to the cluser.

On the first node, there is a single root drive (local). Then there is a ZFS VM pool with 2 SSDs in a RAID 0 configuration (r0ssd400gb).

On the second node, there is a ZFS pool with 2 SSDs in RAID 0 (local). I've also configured a ZFS directory as r0ssd500gb.

I can't transfer VMs created on either node to the other. Any ideas why?
 
Live migration needs (at the moment) shared storage, so you need to setup one. Simplest solution could be NFS.

Offline migration is possible with your setup.
 
Offline migration isn't working. Migration isn't working at all. The ZFS storage array on Node 1 is listed as inactive on Node 2 and so VMs can't be migrated from one to the other. I can't figure out if these two are related or why the storage is listed as inactive.
 
I've just noticed that only storage from Node 1 is available in the Datacenter > Storage tab. The boot partitions of both nodes are ZFS. When I try to add storage from this tab, none of the storage on Node 2 is listed. Possibly because both boot partitions are named the same thing?
 
I can't get this sorted out. Not sure what I'm doing wrong but it seems that none of the local storage on Node 2 is accessible for LXC.

Node1:
ZFS List:
Code:
bkp-vol                                                    367G   532G   104K  /bkp-vol
bkp-vol/.system                                            130M   532G   122M  legacy
bkp-vol/.system/configs-d8bf7623f2464a4f944b29c3b2f43a27  5.20M   532G  5.20M  legacy
bkp-vol/.system/cores                                     1.54M   532G  1.54M  legacy
bkp-vol/.system/rrd-d8bf7623f2464a4f944b29c3b2f43a27        96K   532G    96K  legacy
bkp-vol/.system/samba4                                     224K   532G   224K  legacy
bkp-vol/.system/syslog-d8bf7623f2464a4f944b29c3b2f43a27    612K   532G   612K  legacy
bkp-vol/afp-time-machine                                  98.2G   532G  98.2G  /bkp-vol/afp-time-machine
bkp-vol/jails                                             2.09G   532G   112K  /bkp-vol/jails
bkp-vol/jails/.warden-template-standard                   1.65G   532G  1.60G  /bkp-vol/jails/.warden-template-standard
bkp-vol/jails/dhcp_dns                                     448M   532G  1.82G  /bkp-vol/jails/dhcp_dns
bkp-vol/nfs-bkp                                             96K   532G    96K  /bkp-vol/nfs-bkp
bkp-vol/nfs-pve                                            249G   532G   249G  /bkp-vol/nfs-pve
bkp-vol/rsync-sm-vps-bkp                                  16.9G   532G  4.25G  /bkp-vol/rsync-sm-vps-bkp
r0ssd400gb                                                56.6G   304G   112K  /r0ssd400gb
r0ssd400gb/subvol-101-disk-1                               655M  7.36G   655M  /r0ssd400gb/subvol-101-disk-1
r0ssd400gb/subvol-120-disk-1                               505M  7.51G   505M  /r0ssd400gb/subvol-120-disk-1
r0ssd400gb/subvol-150-disk-1                              3.28G  26.7G  3.28G  /r0ssd400gb/subvol-150-disk-1
r0ssd400gb/subvol-161-disk-1                               798M  7.22G   798M  /r0ssd400gb/subvol-161-disk-1
r0ssd400gb/subvol-170-disk-1                               316M  7.69G   316M  /r0ssd400gb/subvol-170-disk-1
r0ssd400gb/subvol-190-disk-1                               978M  7.05G   978M  /r0ssd400gb/subvol-190-disk-1
r0ssd400gb/subvol-252-disk-1                               626M  7.39G   626M  /r0ssd400gb/subvol-252-disk-1
r0ssd400gb/vm-100-disk-1                                   627M   304G   627M  -
r0ssd400gb/vm-110-disk-1                                  7.69G   304G  7.69G  -
r0ssd400gb/vm-111-disk-1                                  23.2G   304G  23.2G  -
r0ssd400gb/vm-112-disk-1                                  11.8G   304G  11.8G  -
r0ssd400gb/vm-200-disk-1                                  6.16G   304G  6.16G  -
rpool                                                      289G  34.1M    96K  /rpool
rpool/ROOT                                                 280G  34.1M    96K  /rpool/ROOT
rpool/ROOT/pve-1                                           280G  34.1M   280G  /
rpool/data                                                  96K  34.1M    96K  /rpool/data
rpool/swap                                                8.50G  8.03G   524M  -
vol1                                                       480G  6.55T    96K  /vol1
vol1/bkp                                                   192K  6.55T    96K  /vol1/bkp
vol1/bkp/pve                                                96K  6.55T    96K  /vol1/bkp/pve
vol1/media                                                 480G  6.55T   480G  /vol1/media
vol1/owncloud                                             11.0M  6.55T  11.0M  /vol1/owncloud

Zpool list:
Code:
NAME         SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
bkp-vol      928G   367G   561G         -    10%    39%  1.00x  ONLINE  -
r0ssd400gb   372G  56.6G   315G         -     9%    15%  1.00x  ONLINE  -
rpool        298G   281G  17.3G         -    55%    94%  1.00x  ONLINE  -
vol1        7.25T   480G  6.78T         -     3%     6%  1.00x  ONLINE  -

Pvesm status:
Code:
zfs error: cannot open 'rpool/r0ssd500gb/': invalid dataset name
local             dir 1       293786496       293751680           34816 100.49%
r0ssd400gb-zfs  zfspool 1       377880576        59353612       318526964 16.21%
r0ssd500gb      zfspool 0               0               0               0 100.00%


Node2:
ZFS List:
Code:
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             8.54G   453G    96K  /rpool
rpool/ROOT        1.10G   453G    96K  /rpool/ROOT
rpool/ROOT/pve-1  1.10G   453G  1.10G  /
rpool/data          96K   453G    96K  /rpool/data
rpool/r0ssd500gb    96K   453G    96K  /rpool/r0ssd500gb
rpool/swap        7.44G   460G    64K  -

Zpool List:
Code:
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool   476G  1.10G   475G         -     0%     0%  1.00x  ONLINE  -

pvesm status:
Code:
zfs error: cannot open 'r0ssd400gb': dataset does not exist
zfs error: cannot open 'rpool/r0ssd500gb/': invalid dataset name
local             dir 1       475720448         1148288       474572160 0.74%
r0ssd400gb-zfs  zfspool 0               0               0               0 100.00%
r0ssd500gb      zfspool 0               0               0               0 100.00%
 
please post your storage.cfg ("/etc/pve/storage.cfg") as well.

when you tell PVE that a zfs pool/dataset named "xyz" is available on two nodes, then that dataset must exist locally on both nodes. your pvesm error messages indicate that something is messed up in that regard..
 
  • Like
Reactions: JC Connell
Thank you for the help Fabian. I sincerely appreciate it.

Here are the contents of that output from Node 1:
Code:
dir: local
        path /var/lib/vz
        maxfiles 10
        content backup,rootdir,vztmpl,images,iso
        shared

zfspool: r0ssd400gb-zfs
        pool r0ssd400gb
        content rootdir,images
        nodes pve2,pve1
        sparse

zfspool: r0ssd500gb
        pool rpool/r0ssd500gb/
        content rootdir,images
        nodes pve1,pve2
        sparse

Node 2:
Code:
dir: local
        path /var/lib/vz
        maxfiles 10
        content backup,rootdir,vztmpl,images,iso
        shared

zfspool: r0ssd400gb-zfs
        pool r0ssd400gb
        content rootdir,images
        nodes pve2,pve1
        sparse

zfspool: r0ssd500gb
        pool rpool/r0ssd500gb/
        content rootdir,images
        nodes pve1,pve2
        sparse
 
okay, so your storage configuration says there are two zfs pool storages:

"r0ssd400gb-zfs", which uses the dataset "r0ssd400gb" and should be available on both nodes. looking at the "zfs list" output you posted, it seems this is actually only available on the first node - so you need to correct your storage.cfg here (or create such a dataset on the other node)

"r0ssd500gb", which uses the dataset "rpool/r0ssd500gb/". there are two problems here: datasets never end with "/" (so it should be "rpool/r0ssd500gb"), and again, this dataset is only available on one node according to "zfs list" (the second).
 
  • Like
Reactions: JC Connell
I tested a few different storage configurations without success. Here is how the hardware is organized on each node.

Node 1:
-rpool = 1 320gb HDD (Only used for boot, templates and backups)
-r0ssd400gb = 2 x 200gb SSD in Raid 0 (VM and CT storage)

Node 2:
-rpool = 2 x 256gb SSD in Raid 0 (boot, VM and CT storage)

I attempted to create an identical dataset on each node and then tested migration by using the following commands:
Node1:
Code:
zfs create r0ssd400gb/zfsdisks

Node2:
Code:
zfs create rpool/zfsdisks

Node2:
Code:
pvesm add zfspool zfsvols -pool rpool/zfsdisks -content images,rootdir -sparse

Node1:
Code:
root@pve1:~# pvesm add zfspool zfsvols -pool r0ssd400gb/zfsdisks -content images,rootdir -sparse
create storage failed: storage ID 'zfsvols' already defined


Then I created a new container on Node 2 and attempted to migrate it to Node 1 where I received this error:
Code:
Aug 20 10:33:39 starting migration of CT 250 to node 'pve1' (10.0.1.10)
Aug 20 10:33:39 found local volume 'zfsvols:subvol-250-disk-1' (in current VM config)
send from @ to rpool/zfsdisks/subvol-250-disk-1@__migration__ estimated size is 436M
total estimated size is 436M
TIME        SENT   SNAPSHOT
cannot open 'rpool/zfsdisks/subvol-250-disk-1': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
warning: cannot send 'rpool/zfsdisks/subvol-250-disk-1@__migration__': Broken pipe
Aug 20 10:33:39 ERROR: command 'set -o pipefail && zfs send -Rpv rpool/zfsdisks/subvol-250-disk-1@__migration__ | ssh root@10.0.1.10 zfs recv rpool/zfsdisks/subvol-250-disk-1' failed: exit code 1
Aug 20 10:33:39 aborting phase 1 - cleanup resources
Aug 20 10:33:39 ERROR: found stale volume copy 'zfsvols:subvol-250-disk-1' on node 'pve1'
Aug 20 10:33:39 start final cleanup
Aug 20 10:33:39 ERROR: migration aborted (duration 00:00:00): command 'set -o pipefail && zfs send -Rpv rpool/zfsdisks/subvol-250-disk-1@__migration__ | ssh root@10.0.1.10 zfs recv rpool/zfsdisks/subvol-250-disk-1' failed: exit code 1
TASK ERROR: migration aborted

My only goal at this point is live migration. If I need to wipe both hosts and reconfigure them in another way I'm happy to do that. I just don't understand what I need to do.
 
those are not identically named dataset - one is called "r0ssd400gb/zfsdisks" , the other "rpool/zfsdisks". what you want is for example "rpool/zfsdisks" on both nodes, and then add a storage with that dataset once in PVE (the storage configuration is shared over the cluster).
 
  • Like
Reactions: JC Connell