Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFiler.

vcp_ai

Renowned Member
Jul 28, 2010
177
5
83
Valencia -Spain-
My config:
Storage>VirtualDisks: OpenFiler iSCSI + LVM
Storage>Backups: OpenFiler NFS (same machine as VirtualDisks)

Everything working as expected.

Last Friday, OpenFiler machine went down (do not know why), and ProxMox kept trying to reconnect during all weekend.
On Monday, I restarted OpenFiler, and ProxMox was unable to connect to iscsi/LVM where VirtualMachines are located. LVM was not accesible from web page . Error was:
Code:
[2440]ERR:  24:  Error in Perl code: command '/sbin/vgchange -aly Discos_Virtuales_iSCSI_170' failed with exit code 5
I had to create new iSCSI, new LVM, and restore from backups all the machines .....

Today I've tried to recreate the problem:
I've stopped all Virtual Machines that where running on iSCSI/LVM
I've (correctly) shutted down OpenFiler, and after restarting same problem again, I can not access LVM on iSCSI target.
Error in web is the same
Code:
[2440]ERR:  24:  Error in Perl code: command '/sbin/vgchange -aly Discos_Virtuales_iSCSI_170_2' failed with exit code 5
Trace of syslog When OpenFiler Stopped/Started:
Code:
Aug 31 11:59:32 servidor1 iscsid: connect failed (113)
Aug 31 11:59:32 servidor1 iscsid: connect failed (113)
Aug 31 12:00:09 servidor1 iscsid: connect failed (111)
Aug 31 12:00:09 servidor1 iscsid: connect failed (111)
Aug 31 12:00:13 servidor1 iscsid: connect failed (111)
Aug 31 12:00:13 servidor1 iscsid: connect failed (111)
Aug 31 12:00:17 servidor1 iscsid: connect failed (111)
Aug 31 12:00:17 servidor1 iscsid: connect failed (111)
Aug 31 12:00:20 servidor1 proxwww[30435]: Starting new child 30435
Aug 31 12:00:21 servidor1 iscsid: connection2:0 is operational after recovery (41 attempts)
Aug 31 12:00:21 servidor1 iscsid: connection1:0 is operational after recovery (41 attempts)
Aug 31 12:01:09 servidor1 proxwww[30042]: update ticket
Aug 31 12:01:17 servidor1 proxwww[30052]: update ticket
Aug 31 12:01:17 servidor1 proxwww[30468]: Starting new child 30468
Aug 31 12:01:27 servidor1 proxwww[30468]: update ticket
Aug 31 12:01:27 servidor1 proxwww[30469]: Starting new child 30469
Aug 31 12:01:30 servidor1 proxwww[30469]: update ticket
Aug 31 12:01:31 servidor1 proxwww[30470]: Starting new child 30470
Aug 31 12:02:17 servidor1 kernel: nfs: server 192.168.123.19 OK
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: Info fld=0x0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Aug 31 12:02:17 servidor1 kernel: end_request: I/O error, dev sdb, sector 0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: Info fld=0x0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Aug 31 12:02:17 servidor1 kernel: end_request: I/O error, dev sdb, sector 0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: Info fld=0x0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Aug 31 12:02:17 servidor1 kernel: end_request: I/O error, dev sdb, sector 0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: Info fld=0x0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Aug 31 12:02:17 servidor1 kernel: end_request: I/O error, dev sdb, sector 0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Unit Not Ready
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] READ CAPACITY failed
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Test WP failed, assume Write Enabled
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Asking for cache data failed
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Assuming drive cache: write through
Aug 31 12:02:17 servidor1 kernel: sdb: detected capacity change from 104857600000 to 0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: Info fld=0x0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Aug 31 12:02:17 servidor1 kernel: end_request: I/O error, dev sdb, sector 0
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Unit Not Ready
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] READ CAPACITY failed
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Test WP failed, assume Write Enabled
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Asking for cache data failed
Aug 31 12:02:17 servidor1 kernel: sd 5:0:0:0: [sdb] Assuming drive cache: write through
If I then try to start a machine, I obtain:
Code:
Aug 31 12:03:06 servidor1 pvedaemon[30559]: starting VM 200 on node 0 (localhost)
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Unit Not Ready
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] READ CAPACITY failed
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Sense Key : Illegal Request [current] 
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Add. Sense: Logical unit not supported
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Test WP failed, assume Write Enabled
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Asking for cache data failed
Aug 31 12:03:06 servidor1 kernel: sd 5:0:0:0: [sdb] Assuming drive cache: write through
Aug 31 12:03:07 servidor1 qm[30560]: VM 200 start
Aug 31 12:03:08 servidor1 pvedaemon[30559]: VM 200 start failed -
Please take note that VM200 is defined on ISCSI/LVM
Code:
name: Windows-Administracion
vlan0: rtl8139=3E:2C:8C:CD:7A:20
bootdisk: ide0
ostype: wxp
ide0: LVM_170_3:vm-200-disk-1
memory: 512
onboot: 1
sockets: 1
cores: 1
boot: ca
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
and seems is trying to start it from localstorage....
Edit: It is from localhost, not localstorage (Sorry)

OpenFiler tells me that Proxmox machine is connected to iSCSI, and Web Storage>iSCSI gives me no problem (Simply there is no LUN mapped to it).

If restarted Proxmox, with same sympthoms ..

df gives me no info about iscsi/LVM


1.- As iSCSI is available to Proxmox machine, I assume that it is a Proxmox/LVM problem, not OpenFiler. Is this correct ??

2.- If above is true, how can I 'recreate' LVM info ?? I'm sure all my data is ther waiting for me, but do not know how. (I do not have yesterday backups....)

3.- When everything is working again, How can I access an LVM (i.e chdir to it and/or make a copy of data ) ?

Thanks for being so patient to read this long post ...
Best Regards
 
Last edited:
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Output from pveversion

Code:
servidor1:~# pveversion -v
pve-manager: 1.5-10 (pve-manager/1.5/4822)
running kernel: 2.6.32-2-pve
proxmox-ve-2.6.32: 1.5-7
pve-kernel-2.6.32-2-pve: 2.6.32-7
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-16
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.4-1
ksm-control-daemon: 1.0-3
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

use 'vgs', 'lvs: and 'pvs' to view LVM status.

What is the output of

#pvs

OutPuts:
Code:
servidor1:~# vgs
  VG   #PV #LV #SN Attr   VSize  VFree
  pve    1   3   0 wz--n- 36.77G 4.00G
servidor1:~# pvs
  PV         VG   Fmt  Attr PSize  PFree
  /dev/sda2  pve  lvm2 a-   36.77G 4.00G

servidor1:~# lvs
  LV   VG   Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  data pve  -wi-ao 19.52G
  root pve  -wi-ao  9.25G
  swap pve  -wi-ao  4.00G
servidor1:~#
Text copy of Storage screen (related to iscsi) is :

Code:
 iSCSI_01_170     ISCSI   Yes    Yes   Yes   n/a      n/a    n /a 
 LVM_170_2         LVM     Yes   Yes   Yes   32.77  36.77 89.13%

 iSCSI_01_170_2  ISCSI   Yes    Yes   Yes   n/a     n/a     n/a 
 LVM_170_3         LVM    Yes     No    Yes  0.00    0.00    0.00%
First pair ISCSI_01/170 /LVM170_2 is what was working on Friday

LVM_170_2 show NO ENTRIES

Second pair ISCSI01_170_2 / LVM170_3 is what was working this morning (where I restored the Backups of the machines).

LVM_170_3 shows failure:

Code:
[2440]ERR:  24:  Error in Perl code: command '/sbin/vgchange -aly Discos_Virtuales_iSCSI_170_2' failed with exit code 5
Best Regards
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Please can you check your /etc/pve/storage.cfg - are all device names still correct?
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Contents of storage.cfg :
Code:
iscsi: iSCSI_01_170
    portal 192.168.123.19
    target iqn.2006-01.com.openfiler:tsn.325c83378f32
    content none

lvm: LVM_170_2
    vgname pve
    shared
    content images

    
    
iscsi: iSCSI_01_170_2
    portal 192.168.123.19
    target iqn.2006-01.com.openfiler:tsn.98552a1a8002
    content none

lvm: LVM_170_3
    vgname Discos_Virtuales_iSCSI_170_2
    base iSCSI_01_170_2:0.0.0.scsi-14f504e46494c45004448384b69672d42564c392d46467948
    shared
    content images
On both cases, iscsi definitions are correct, iscsi target iqn is the one that Openfiler reports.

LVM_170_2 seems that has missed the 'base' to the iscsi.

LVM_170_3 has a pointer to iSCSI_01_170_2 (correct).
Do not know if rest of the info 0.0.0.scsi-14f504e46494c45004448384b69672d42564c392d46467948 is what it should be.

Can you give an example of a know working iSCSI/LVM storage.cfg ??

To
l.mierzwa: Looking at your post ..

 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Do not know if rest of the info 0.0.0.scsi-14f504e46494c45004448384b69672d42564c392d46467948 is what it should be.


This is the the identifier for the iSCSI lun. You create a new lun, so I guess that changed.

I suggest you simple delete the definition of the 'LVM_170_3', and the recreate it (using the web interface).
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Maybe its because iscsi disk got renamed? I had such problem and resolved it by using multipath on top of iscsi.
http://forum.proxmox.com/threads/4176-Pause-virtual-machine-od-I-O-error?highlight=

Sorry but I can not follow your post....
1.- Do not know what is the command that produces this output:
Code:
scsi 7:0:0:100: Direct-Access     IET      VIRTUAL-DISK     0    PQ: 0 ANSI: 4
sd 7:0:0:100: Attached scsi generic sg0 type 0
sd 7:0:0:100: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 7:0:0:100: [sda] 11213398016 512-byte logical blocks: (5.74 TB/5.22 TiB)
2.- dmsetups deps only produces:
Code:
servidor1:~# dmsetup deps
pve-swap: 1 dependencies        : (8, 2)
pve-root: 1 dependencies        : (8, 2)
pve-data: 1 dependencies        : (8, 2)
So I can not follow ...
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

This is the the identifier for the iSCSI lun. You create a new lun, so I guess that changed.

I suggest you simple delete the definition of the 'LVM_170_3', and the recreate it (using the web interface).

If deletede definition of LVM_170_3 (directly from storage.cfg). Restarted PrxoMox and when trying to create new LVM using iscsi_01_170_2 obtain:

Code:
Error: command '/sbin/vgchange -aly Discos_Virtuales_iSCSI_170_2' failed with exit code 5
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

1. this output is taken from kernel messages, run:


2. proxmox will connect iscsi on first use, so run

and try again



Output of dmseg | grep scsi gives no info similart to yours, about iscsi , (I have a local ata Disk) ...

Code:
servidor1:~# dmesg |grep scsi
scsi0 : ata_piix
scsi1 : ata_piix
scsi2 : ata_piix
scsi3 : ata_piix
scsi 0:0:0:0: Direct-Access     ATA      ST340810A        3.99 PQ: 0 ANSI: 5
sd 0:0:0:0: Attached scsi generic sg0 type 0
iscsi: registered transport (tcp)
iscsi: registered transport (iser)
scsi4 : iSCSI Initiator over TCP/IP
scsi5 : iSCSI Initiator over TCP/IP
scsi6 : iSCSI Initiator over TCP/IP
iscsi: registered transport (tcp)
iscsi: registered transport (iser)
scsi7 : iSCSI Initiator over TCP/IP
scsi8 : iSCSI Initiator over TCP/IP
Note that actually I'm recreating virtual macines using an NFS share ...
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Maybe its because iscsi disk got renamed? I had such problem and resolved it by using multipath on top of iscsi.
http://forum.proxmox.com/threads/4176-Pause-virtual-machine-od-I-O-error?highlight=

I've performed
Code:
# stop iscsi
/etc/init.d/open-iscsi stop
# remove all device mapper devices
dmsetup remove_all
# start iscsi
/etc/init.d/open-iscsi start
# rescan lvm using pvesm
pvesm list --all
as suggested in your post and obtained:
Code:
servidor1:~# /etc/init.d/open-iscsi stop
Disconnecting iSCSI targets:Logging out of session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.325c83378f32,                                            portal: 192.168.123.19,3260]
Logging out of session [sid: 3, target: iqn.2006-01.com.openfiler:tsn.98552a1a8002, portal: 192.168.123.19,3260]
Logout of [sid: 1, target: iqn.2006-01.com.openfiler:tsn.325c83378f32, portal: 192.168.123.19,3260]: successful
Logout of [sid: 3, target: iqn.2006-01.com.openfiler:tsn.98552a1a8002, portal: 192.168.123.19,3260]: successful
.
Stopping iSCSI initiator service:.
servidor1:~# dmsetup remove_all
servidor1:~# /etc/init.d/open-iscsi start
Starting iSCSI initiator service: iscsid.
Setting up iSCSI targets:
iscsiadm: No records found!
.
Mounting network filesystems:.

servidor1:~# pvesm list --all
local:103/vm-103-disk-1.raw 103   raw   10485760
local:104/vm-104-disk-1.raw 104   raw    5242880
local:109/vm-109-disk-1.raw 109   raw    5242880
local:111/vm-111-disk-1.raw 111   raw    5242880
iscsiadm: No active sessions.
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.325c83378f32, portal: 192.168.123.19,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.325c83378f32, portal: 192.168.123.19,3260]: succ                                           essful
VirtualDisksNFS:300/vm-300-disk-1.raw 300   raw   10493952
VirtualDisksNFS:318/vm-318-disk-1.raw 318   raw   10502144
VirtualDisksNFS:320/vm-320-disk-1.raw 320   raw   10498048
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.98552a1a8002, portal: 192.168.123.19,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.98552a1a8002, portal: 192.168.123.19,3260]: successful
And I can NOT create a new LVM on iSCSI_170_3 (iscsi target ended on ...8002)
 
Last edited:
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Well, I've moved my VirtualDisks to an NFS share, on same OpenFiler machine, and seems that problem does is not reproduced in NFS.
I've shutted down/UP Openfiler, with a running virtual machine, and system was restored correctly.
Will try again when I have more 'spare' time with iSCSI.

Can you tell me where can I find the logs of loosing/recovering connection with a remote VirtualDisk ? I've found no log on SYSLOG, but a mail sent, and my mail does not work ...

Best Regards.

Vicente
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Ok so I was wrestling with this same exact problem all night. It is not a Proxmox issue as far as I can tell. There is an article that talks about a bug in openfiler where you lose iscsi after rebooting the filer.

Look at the "Make required tweaks to address the Openfiler 2.3 reboot issue" section of this article:

http://greg.porter.name/wiki/HowTo:Openfiler#Install_.22gotcha.22
 
Re: Can not re-connect to OpenFiler (iSCSI + LVM) storage after shutting down OpenFil

Hi Vicente,

I encountered a similar problem, not with openfiler but a centos distro. My centos server acted as the iscsi-server. After a reboot of this server none of my VM's could be started. The cluster could connect to the iscsi server but the storage didn't get connected.
This was caused by the iscsiadm tool that I used to setup the iscsi for my proxmox cluster. iscsiadm does NOT save the configuration in /etc/tgt/targets.conf to make the configuration persistent over a reboot.
Check this file for an iqn entry and the backingstore.

Kind regards,
Siebe
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!