snapshot rollback error (zfs over iscsi from nas4free)

Ilya Pollyak

New Member
Apr 26, 2016
13
0
1
42
Hello!

i've got an issue with snapshot making. the program behavior differs from other related issues.

it happens with all VMs and on two standalone nodes (not in cluster)

nas4free 10.2.0.2 - Prester (revision 2545)
proxmox 4.1-1/2f96504d
both are the latest versions, as far as i know

zfs over iscsi storage is used (no matter zdev or file extent - the result is the same)
this feature was configured in accordance with official HOWTO

snapshot is successful, but when i try to roll it back, an error appears:

command '/usr/bin/ssh -o 'BatchMode=yes' -i /etc/pve/priv/zfs/10.15.252.101_id_rsa root@10.15.252.101 /var/etc/rc.d/istgt onerestart '>&' /dev/null' failed: exit code 1

I looked for /var/etc/rc.d/istgt on nas4free - there is no such dir and file
but there is /etc/rc.d/iscsi_target
which starts istgt

command '/usr/bin/scp -o 'BatchMode=yes' -i /etc/pve/priv/zfs/10.15.252.101_id_rsa /tmp/config17864 /var/etc/iscsi/istgt.conf' failed: exit code 1

and VM becomes locked

I unlock it with qm unlock VMID and retry rollback

another error appears:

Could not find lu_name for zvol vm-100-disk-4 at /usr/share/perl5/PVE/Storage/ZFSPlugin.pm line 102.

and all subsequent attempts end with this error

but I still can freely create and delete snapshots

ps: I've just noticed, that from the moment when the error "Could not find lu_name for zvol" appears, every operation, associated with access to vm's disk leads to this message again.

what can I do to fix it?
 
Last edited:
I've found the wrong path in /usr/share/perl5/PVE/Storage/LunCmd/Istgt.pm

everything work fine after I've changed:

'/var/etc/rc.d/istgt'
to
'/etc/rc.d/iscsi_target'

in line 18

I suppose, nas4free developers changed the path in last version
 
unfortunately the problem wasn't solved completely and I can't find it's root cause.

I make a fresh install of
nas4free 10.2.0.2 - Prester (revision 2545) - istgt version 0.5 (20150713)
proxmox 4.1-1/2f96504d

configure for ZFS over iSCSI

then apt-get update && apt-get dist-upgrade

then edit the path in Istgt.pm as I mentioned earlier and reboot proxmox

create new VM
test snapshots function before the first start. create and rollback. snapshoting (without RAM contents, cause VM is not running) on empty disk is working correctly.
then I start VM,
trying to create new snapshot - error

VM 100 qmp command 'savevm-start' failed - failed to open 'iscsi://10.15.252.101/iqn.2016-04.news.dalet-chelabnsk.a-stor-1:disk0/2'

try to shutdown VM
VM quit/powerdown failed - got timeout
manual shutdown by killing kvm process

try to start VM
Could not find lu_name for zvol vm-100-disk-1 at /usr/share/perl5/PVE/Storage/ZFSPlugin.pm line 105.

All I can do is to delete VM after additional manipulations: delete HDD from VM configuration (and manually delete zvol from NAS)

this issue is registered on all nodes and all vm's

what can I do to fix it?
how can I get more detailed messages in log to understand what's wrong?
as far as I understand, the problem is between proxmox and nas4free. maybe I need to rollback nas4free version? but if the problem will not be solved now, it most probably would appear in future versions, too.

here is istgt.conf + storage.cfg + vm100.conf

/var/etc/iscsi/istgt.conf
# Global section
[Global]
NodeBase "iqn.2016-04.news.dalet-chelabnsk.a-stor-1"
PidFile "/var/run/istgt.pid"
AuthFile "/var/etc/iscsi/auth.conf"
MediaDirectory "/mnt"
Timeout 30
NopInInterval 20
MaxR2T 32
DiscoveryAuthMethod Auto
DiscoveryAuthGroup None
MaxSessions 16
MaxConnections 4
FirstBurstLength 262144
MaxBurstLength 1048576
MaxRecvDataSegmentLength 262144
MaxOutstandingR2T 16
DefaultTime2Wait 2
DefaultTime2Retain 60
# UnitControl section
[UnitControl]
AuthMethod CHAP Mutual
AuthGroup AuthGroup10000
#Portal UC1 127.0.0.1:3261
#Netmask 127.0.0.1
# PortalGroup section
[PortalGroup1]
Portal DA1 10.15.252.101:3260

# InitiatorGroup section
[InitiatorGroup1]
InitiatorName "ALL"
Netmask 10.15.252.0/24

# LogicalUnit section
[LogicalUnit1]
TargetName disk0
Mapping PortalGroup1 InitiatorGroup1
AuthGroup None
UnitType Disk
QueueDepth 32
LUN0 Storage /dev/zvol/pool0/zvol0 AUTO
LUN0 Option WriteCache Enable
LUN1 Storage /dev/zvol/pool0/vm-100-disk-1 AUTO
LUN1 Option WriteCache Disable


/etc/pve/storage.cfg
dir: local
path /var/lib/vz
content images,vztmpl,rootdir,iso
maxfiles 0

zfs: a-stor-1
blocksize 4k
iscsiprovider istgt
pool pool0
target iqn.2016-04.news.dalet-chelabnsk.a-stor-1:disk0
portal 10.15.252.101
sparse
content images
nowritecache


vm100.conf
bootdisk: ide0
cores: 1
ide0: a-stor-1:vm-100-disk-1,size=32G
ide2: local:iso/ru_windows_7_enterprise_x86_dvd_x15-70945.iso,media=cdrom
memory: 512
name: vm100
net0: bridge=vmbr0,e1000=62:65:66:66:64:33
numa: 0
ostype: win7
parent: gg
smbios1: uuid=c38c406c-3083-45cf-b81f-14e0f983ebd2
sockets: 1

[gg]
bootdisk: ide0
cores: 1
ide0: a-stor-1:vm-100-disk-1,size=32G
ide2: local:iso/ru_windows_7_enterprise_x86_dvd_x15-70945.iso,media=cdrom
memory: 512
name: vm100
net0: bridge=vmbr0,e1000=62:65:66:66:64:33
numa: 0
ostype: win7
smbios1: uuid=c38c406c-3083-45cf-b81f-14e0f983ebd2
snaptime: 1461749029
sockets: 1
 
made a roll back from

nas4free 10.2.0.2 - Prester (revision 2545) (apr2016)
to
nas4free 10.2.0.2 - Prester (revision 2268) (jan2016)

no result