[SOLVED] No access to LVM over iSCSI after a node reinstallation

linkstat

Renowned Member
Mar 15, 2015
38
20
73
Córdoba, Argentina
Hello community.
(this is my first post :)).

I was working with ProxmoxVE from the 3.0 version. I always followed the guidelines, wikis and forums, and so I was able to solve problems without major complications.
Thus tasks such as updating, re-installing nodes, configuration of different types of storage, so far, never proved impossible or too complicated, but now I have a problem that has me fairly confused.

Actually, we have a cluster of 4 nodes, and one external storage (IBM DS3200 with iSCSI module). The PVE version is 4.2-17/e1400248. So, the past week, due to hardware changes, I made a reinstallation of one of the nodes, (following the guidelines at https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster#Re-installing_a_cluster_node) in the same way that i had already done on other occasions with the other nodes in the cluster.
But this time, I had trouble connecting to external storage via iSCSI...
Simply, i can't get to multipath (or open-iscsi) work correctly. And I am lost.

I'm using the SAME network configuration with the SAME numbers of NIC adapters (but no the same adapters) that i had in the previous installation. The only difference is that this time I make the installation using ZFS RAID1 (all the other times and the other nodes of the cluster use the standard LVM).

Code:
auto lo

iface lo inet loopback

iface eth2 inet manual

iface eth3 inet manual

auto eth4
iface eth4 inet static
        address  192.168.103.3
        netmask  255.255.255.0
        mtu 9000

auto eth5
iface eth5 inet static
        address  192.168.104.3
        netmask  255.255.255.0
        mtu 9000

allow-vmbr0 bond0
iface bond0 inet manual
    ovs_bridge vmbr0
    ovs_type OVSBond
    ovs_bonds eth2 eth3
    ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
    ovs_type OVSBridge
    ovs_ports bond0 vlan10

allow-vmbr0 vlan10
iface vlan10 inet static
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_options tag=10
    ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
    address 172.16.79.3
    netmask 255.255.255.0
    gateway 172.16.79.8

Code:
defaults {
        polling_interval        2
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        uid_attribute           ID_SERIAL
        rr_min_io               100
        failback                immediate
        no_path_retry           queue
        user_friendly_names     yes
}

blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z][[0-9]*]"
        devnode "^sd[a-z][[0-9]*]"
        wwid "350014ee20ada90cd"
        wwid "350014ee2602f866a"
        wwid "350014ee2b7256412"
        wwid "350014ee20b097fd6"
        wwid "350014ee2b725698c"
}

blacklist_exceptions {
        wwid "360080e50003e26780000036754d47507"
}

multipaths {
  multipath {
        wwid "360080e50003e26780000036754d47507"
        alias mpathds1
  }
}

Code:
InitiatorName=iqn.1993-08.org.debian:pve-carbonis
## DO NOT EDIT OR REMOVE THIS FILE!
## If you remove this file, the iSCSI daemon will not start.
## If you change the InitiatorName, existing access control lists
## may reject this initiator.  The InitiatorName must be unique
## for each iSCSI initiator.  Do NOT duplicate iSCSI InitiatorNames.
#InitiatorName=iqn.1993-08.org.debian:01:6f6dd133611f
InitiatorName=iqn.1993-08.org.debian:pve-carbonis

The /etc/iscsi/iscsid.conf is correctly configured for automatic startup

Can ping to iscsi target? Yes, in booth address (192.168.103.254 and ...104.254).
Can telnet to iscsi target on listen port (3260)? Yes.
Code:
Trying 192.168.103.254...
Connected to 192.168.103.254.
Escape character is '^]'.

Have login to the scsi target? Yes
Code:
[0:0:0:0]    disk    ATA      WDC WD10EFRX-68P 0A82  /dev/sda
[1:0:0:0]    disk    ATA      WDC WD10EFRX-68P 0A82  /dev/sdb
[2:0:0:0]    disk    ATA      WDC WD30PURX-64P 0A80  /dev/sdc
[3:0:0:0]    disk    ATA      WDC WD30PURX-64P 0A80  /dev/sdd
[4:0:0:0]    disk    ATA      WDC WD30PURX-64P 0A80  /dev/sde
[6:0:0:0]    disk    USB Mass  Storage Device        /dev/sdf
[8:0:0:0]    disk    IBM      1746      FAStT  1070  -
the last line corresponds to the IBM storage, but multipath still do nothing! :mad:

The (related) dmesg output to:
Code:
[74965.593645] Loading iSCSI transport class v2.0-870.
[74965.595176] iscsi: registered transport (tcp)
[74965.600311] iscsi: registered transport (iser)
[74970.529641] scsi host8: iSCSI Initiator over TCP/IP
[74970.841587] scsi 8:0:0:0: Direct-Access     IBM      1746      FAStT  1070 PQ: 1 ANSI: 5
[74970.842527] scsi 8:0:0:0: Attached scsi generic sg6 type 0
Good! Have connection to external storage, and lscsi show me the external storage...

Code:
[74985.270228] device-mapper: table: 251:0: multipath: error getting device
[74985.270722] device-mapper: ioctl: error adding target to table
[74985.354220] device-mapper: table: 251:0: multipath: error getting device
[74985.354728] device-mapper: ioctl: error adding target to table
[74985.418138] device-mapper: table: 251:0: multipath: error getting device
[74985.418628] device-mapper: ioctl: error adding target to table
[74985.490253] device-mapper: table: 251:0: multipath: error getting device
[74985.490750] device-mapper: ioctl: error adding target to table
[74985.574174] device-mapper: table: 251:0: multipath: error getting device
[74985.574654] device-mapper: ioctl: error adding target to table
It's ok the error getting devices because are the blacklisted (thus the googling errors about multipath).

lets see other relevant commands:
multipath -ll
(output absolutely nothing)

vgs
vgchange -ay

(all of them output absolutely nothing)

ls -l /dev/mapper/
Code:
total 0
crw------- 1 root root 10, 236 Jul 21 02:36 control


In the others nodes, all is working correctly, (i'll show it, in next thread)


I tried to change the initiator name of iSCSI, change the cables, NICs... but nothing. The iscsi.conf and multipath.conf files are exactly the same on all nodes...
So, I am really very confused. I even thought that perhaps it may be something related to the fact that I use ZFS instead of LVM for the base system. Or the fact that the first time, a long time ago, when create the LVM storage over iSCSI i use justly this node with i'm having problems right now... but the way, this is my related iSCSI content of /etc/pve/storage (carbonis is the hostname of the problematic node):
Code:
dir: local
    path /var/lib/vz
    maxfiles 0
    content rootdir,iso,vztmpl,images

iscsi: aiur-ds1p3
    portal 192.168.103.254
    target iqn.1992-01.com.lsi:2365.60080e50003e26780000000052d66902
    nodes aiur
    content none

iscsi: shakuras-ds1p3
    portal 192.168.103.254
    target iqn.1992-01.com.lsi:2365.60080e50003e26780000000052d66902
    nodes shakuras
    content none

iscsi: carbonis-ds1p3
    portal 192.168.103.254
    target iqn.1992-01.com.lsi:2365.60080e50003e26780000000052d66902
    nodes carbonis
    content none

iscsi: zerus-ds1p3
    portal 192.168.103.254
    target iqn.1992-01.com.lsi:2365.60080e50003e26780000000052d66902
    nodes zerus
    content none

lvm: khaydarin
    vgname khaydarin
    content rootdir,images
    base carbonis-ds1p3:0.0.10.scsi-360080e50003e26780000036754d47507
    shared

lvm: aiur-lvm
    vgname khaydarin
    content images,rootdir

lvm: shakuras-lvm
    vgname khaydarin
    content images,rootdir

lvm: carbonis-lvm
    vgname khaydarin
    content images,rootdir

lvm: zerus-lvm
    vgname khaydarin
    content rootdir,images

And of course, I can't start any VM that was supposed to run on the recently reinstalled node:
qm start 100
Code:
  Volume group "khaydarin" not found
can't activate LV '/dev/khaydarin/vm-100-disk-1':   Cannot process volume group khaydarin

Maybe the line with the keyword "base" (under the declaration of the LVM) has something to do, but I do not know, because that was done automatically, after setting the iSCSI storage from the Web GUI long time ago.

And finally... some screenshots
CiTKvo4xX01vwkHl3tcw_Ena8d5jOn90r2FchMHJSY4
AaBUR4puaxG65ClLrPGpGWbtSpAkwCnCyQI9Prf3BI4

So ... please, really appreciate some help.
 
vgs
Code:
  VG        #PV #LV #SN Attr   VSize   VFree 
  khaydarin   1  21   0 wz--n-   9.04t 88.00g
  pve         1   3   0 wz--n- 930.26g 16.00g

ls -l /dev/mapper/
Code:
total 0
crw------- 1 root root 10, 236 Jul 14 15:06 control
lrwxrwxrwx 1 root root       7 Jul 14 15:08 khaydarin-vm--101--disk--1 -> ../dm-3
lrwxrwxrwx 1 root root       7 Jul 14 15:09 khaydarin-vm--103--disk--1 -> ../dm-5
lrwxrwxrwx 1 root root       7 Jul 14 15:09 khaydarin-vm--103--disk--2 -> ../dm-6
lrwxrwxrwx 1 root root       7 Jul 14 15:09 khaydarin-vm--301--disk--1 -> ../dm-8
lrwxrwxrwx 1 root root       7 Jul 14 15:09 khaydarin-vm--401--disk--1 -> ../dm-7
lrwxrwxrwx 1 root root       7 Jul 14 15:09 khaydarin-vm--408--disk--1 -> ../dm-9
lrwxrwxrwx 1 root root       7 Jul 14 15:08 mpathds1 -> ../dm-4
lrwxrwxrwx 1 root root       7 Jul 14 15:06 pve-data -> ../dm-2
lrwxrwxrwx 1 root root       7 Jul 14 15:06 pve-root -> ../dm-0
lrwxrwxrwx 1 root root       7 Jul 14 15:06 pve-swap -> ../dm-1

lvs
Code:
  LV            VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vm-100-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-100-disk-2 khaydarin -wi-------   1.50t                                                    
  vm-101-disk-1 khaydarin -wi-ao----  60.00g                                                    
  vm-102-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-103-disk-1 khaydarin -wi-ao----  60.00g                                                    
  vm-103-disk-2 khaydarin -wi-ao----   1.50t                                                    
  vm-104-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-104-disk-2 khaydarin -wi-------   1.50t                                                    
  vm-300-disk-1 khaydarin -wi-------  40.00g                                                    
  vm-301-disk-1 khaydarin -wi-ao----   2.00t                                                    
  vm-302-disk-1 khaydarin -wi-------   1.00t                                                    
  vm-304-disk-1 khaydarin -wi-------  40.00g                                                    
  vm-400-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-401-disk-1 khaydarin -wi-ao----  60.00g                                                    
  vm-403-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-403-disk-2 khaydarin -wi------- 180.00g                                                    
  vm-404-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-404-disk-2 khaydarin -wi------- 512.00g                                                    
  vm-405-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-406-disk-1 khaydarin -wi-------  60.00g                                                    
  vm-408-disk-1 khaydarin -wi-ao----  60.00g                                                    
  data          pve       -wi-ao---- 756.27g                                                    
  root          pve       -wi-ao----  96.00g                                                    
  swap          pve       -wi-ao----  62.00g

multipath related dmesg output:

Code:
[   92.623822] device-mapper: multipath round-robin: version 1.0.0 loaded
[   92.633978] rdac: device handler registered
[   92.634983] sd 7:0:0:10: rdac: LUN 10 (RDAC) (owned)
[   92.648075] sd 8:0:0:10: rdac: LUN 10 (RDAC) (owned)
[   92.733963] sd 10:0:0:10: rdac: LUN 10 (RDAC) (owned)
[   92.734884] sd 9:0:0:10: rdac: LUN 10 (RDAC) (owned)

We are, without knowing where to focus on my problem (multipath? ISCSI? LVM? or something else that i don't known?)

Greetings!
 
Solved!

The problem was given with the iSCSI initiator name.

Unlike that happened with the other nodes of the cluster, this time there was a change of NICs adapters, that produced a different random iSCSI Initiator names (use the MAC address of the NICs for that). Ok. I knew this, and I had not paid attention to this, because I believed that I had correctly setting another iSCSI Initiator names in the storage ACLs, but no.
After fix the initiator names ACLs on the external storage, all is working correctly again.

Greetings!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!