ZFS over iSCSI error "Invalid lun definition"

hexblot · Nov 3, 2023

Hello,
I have a working cluster with 3 PVE nodes, and 1 storage node (PVE installation, not part of the cluster and nothing used other than ZFS), which is working with ZFS over iSCSI without issues.
Trying to add a second storage node with the same setup process as for the first (based on notes), but when trying to create a hard drive on the new storage, I get a popup with the error message "failed to update VM 103: Invalid lun definition in config! (500)".

Executing the PVE ssh command on the PVE host (got it from the error message initially displayed due to server signature):

Code:

root@hv01 ~ # /usr/bin/ssh -o 'BatchMode=yes' -i /etc/pve/priv/zfs/1.2.3.4_id_rsa root@1.2.3.4 zfs list -o name,volsize,origin,type,refquota -t volume,filesystem -d1 -Hp rpool/data
rpool/data    -    -    filesystem    0
rpool/data/vm-103-disk-0    8589934592    -    volume    -
rpool/data/vm-103-disk-1    5368709120    -    volume    -
rpool/data/vm-103-disk-2    2147483648    -    volume    -
rpool/data/vm-103-disk-3    2147483648    -    volume    -
rpool/data/vm-103-disk-4    2147483648    -    volume    -
rpool/data/vm-103-disk-5    1073741824    -    volume    -
root@hv01 ~ #

correctly returns the list of disks that were requested to be created with the proper sizes and all however when using targetcli, no block backends or luns have been created for the above volumes (executing targetcli on the storage node):

Code:

root@store02:~# targetcli
targetcli shell version 2.1.53
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> ls
o- / .................................................................................. [...]
  o- backstores ....................................................................... [...]
  | o- block ........................................................... [Storage Objects: 0]
  | o- fileio .......................................................... [Storage Objects: 1]
  | | o- tmpdsk .............. [/rpool/data/disks/tmpdsk.img (100.0MiB) write-back activated]
  | |   o- alua ............................................................ [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ................................ [ALUA state: Active/optimized]
  | o- pscsi ........................................................... [Storage Objects: 0]
  | o- ramdisk ......................................................... [Storage Objects: 0]
  o- iscsi ..................................................................... [Targets: 1]
  | o- iqn.2023-11.cloud.myhost.store02:data ............................... [TPGs: 1]
  |   o- tpg1 ........................................................ [no-gen-acls, no-auth]
  |     o- acls ................................................................... [ACLs: 0]
  |     o- luns ................................................................... [LUNs: 1]
  |     | o- lun0 ......... [fileio/tmpdsk (/rpool/data/disks/tmpdsk.img) (default_tg_pt_gp)]
  |     o- portals ............................................................. [Portals: 1]
  |       o- 1.2.3.4:3260 ........................................................ [OK]
  o- loopback .................................................................. [Targets: 0]
  o- vhost ..................................................................... [Targets: 0]
  o- xen-pvscsi ................................................................ [Targets: 0]
/>

Do note, that from the PVE GUI when navigating to the storage of an individual node, both the "Summary" and "VM Disks" panes of the storage correctly list sizes / contents.

Can you please suggest next steps in debugging this issue?

Thank you in advance!

bbgeek17 · Nov 3, 2023

Have you looked a the config of the VM in question? Its unclear from the limited information provided whether the error is related to storage operation or an actual config update on PVE side.
Tail the journal during the command execution to isolate all messages for that event and examine them. Try to use CLI to reduce the number of API calls and other miscellaneous stuff that GUI may be doing.

good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

hexblot · Nov 3, 2023

bbgeek17 said:
Have you looked a the config of the VM in question? Its unclear from the limited information provided whether the error is related to storage operation or an actual config update on PVE side.
Tail the journal during the command execution to isolate all messages for that event and examine them. Try to use CLI to reduce the number of API calls and other miscellaneous stuff that GUI may be doing.

good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

thank you for the reply, however I am not aware on what the actual call from the PVE side is in order to replicate.

To my understanding
1. manually doing the steps on the storage side seem to work fine (creating a zvol/lun, and getting access to it over iSCSI)
2. PVE, having SSHd into the storage (checked via command in original post, works) creates a zvol (which I can see gets created without issue), and then tries to create an iSCSI backend / LUN which never gets created. The process fails silently here.
3. PVE tries to assign the LUN it thinks it created in (2) to the VM, and the above error message triggers.

However I do not know where to find the exact commands to manually replicate the process.

Please advise if possible -- and thank you in advance!

bbgeek17 · Nov 3, 2023

hexblot said:
thank you for the reply, however I am not aware on what the actual call from the PVE side is in order to replicate.

well, either you are allocating a storage volume, attaching it , or both.
You can create a volume with : pvesm alloc
You can attach with: qm set $VMID --scsi1 $STORAGE:diskname

hexblot said:
2. PVE, having SSHd into the storage (checked via command in original post, works) creates a zvol (which I can see gets created without issue), and then tries to create an iSCSI backend / LUN which never gets created. The process fails silently here.
3. PVE tries to assign the LUN it thinks it created in (2) to the VM, and the above error message triggers.

My suggestion to troubleshoot this was: repeat the steps that cause the error while tailing the log, ie : journalctl -f
Examine the log for additional information.
To further isolate the steps for your troubleshooting - use CLI to split create and attach.

You can also examine the plugin responsible to get more familiar with the underlying technology: /usr/share/perl5/PVE/Storage/ZFSPlugin.pm

I also suggested that you examine the VM config: qm config [vmid]

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

bbgeek17 · Nov 3, 2023

Some additional pointers:

Code:

grep -R "Invalid lun definition in config"
Storage/LunCmd/LIO.pm:              die "Invalid lun definition in config!\n"

code that produces this error:

Code:

foreach my $lun (@{$tpg->{luns}}) {
                    my ($idx, $storage_object);
                    if ($lun->{index} =~ /^(\d+)$/) {
                        $idx = $1;
                    }
                    if ($lun->{storage_object} =~ m|^($BACKSTORE/.*)$|) {
                        $storage_object = $1;
                    }
                    die "Invalid lun definition in config!\n"
                        if !(defined($idx) && defined($storage_object));
                    push @$res, { index => $idx, storage_object => $storage_object };
                }

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

hexblot · Nov 6, 2023

bbgeek17 said:
well, either you are allocating a storage volume, attaching it , or both.
You can create a volume with : pvesm alloc
You can attach with: qm set $VMID --scsi1 $STORAGE:diskname

the current UI process was creating the storage, then going to an existing VM and trying to add a new hard drive from that storage.
pvesm alloc is the command that creates the error.

bbgeek17 said:
My suggestion to troubleshoot this was: repeat the steps that cause the error while tailing the log, ie : journalctl -f
Examine the log for additional information.
To further isolate the steps for your troubleshooting - use CLI to split create and attach.

no logs generated in journalctl unfortunately

bbgeek17 said:
You can also examine the plugin responsible to get more familiar with the underlying technology: /usr/share/perl5/PVE/Storage/ZFSPlugin.pm

my Pearl is quite basic, but from the debugging done, I don't think it's the ZFS part that is the issue (ZFS volumes get created without issue)

bbgeek17 said:

Some additional pointers:

Code:

grep -R "Invalid lun definition in config"
Storage/LunCmd/LIO.pm:              die "Invalid lun definition in config!\n"

code that produces this error:

Code:

foreach my $lun (@{$tpg->{luns}}) {
                    my ($idx, $storage_object);
                    if ($lun->{index} =~ /^(\d+)$/) {
                        $idx = $1;
                    }
                    if ($lun->{storage_object} =~ m|^($BACKSTORE/.*)$|) {
                        $storage_object = $1;
                    }
                    die "Invalid lun definition in config!\n"
                        if !(defined($idx) && defined($storage_object));
                    push @$res, { index => $idx, storage_object => $storage_object };
                }

from my understanding, after the ZVol is created in ZFSPlugin.pm, then a block-based backend should be created (this is what is missing), and then a LUN is created based on that backend (which is the code above that throws the error message due to the missing backend).

Trying to figure out where the code for the backend creation is.

Is there a way to debug the code (ideally step through it, if that is not possible I am currently inserting print statements but don't know if that's the best way )?

Thank you in advance for any further hints!

bbgeek17 said:
I also suggested that you examine the VM config: qm config [vmid]

since we don't get to the VM part I don't think that's needed.

Search

Search

ZFS over iSCSI error "Invalid lun definition"

hexblot

Member

bbgeek17

Distinguished Member

hexblot

Member

bbgeek17

Distinguished Member

bbgeek17

Distinguished Member

hexblot

Member

We value your privacy