iSCSI Issues - Dell Compellent SAN - Corruption of superblocks and controllers

rwron

New Member
Mar 7, 2014
2
0
1
Hi all,

I have the following setup:

6 x physical nodes all setup in proxmox pve cluster (2 x E5-2667 v2 CPU's, 256 GB mem per box)
2 x SAN Controllers in my reundant Dell Compellent SAN with an individual portal ip each.

Everything is up and running and since proxmox wont show all the LUN's from 1 primary portal ip i had to setup 2 x ISCSI's - 1 for each controller.

So i have:

SAN (35 LUN's)
SAN2 (35 LUN's)

And then i had setup 70 LVM's as per the recommendations.

In CLI the LUN's are reported 100% correctly and multipathing etc. is all setup.

Is there no way to get Proxmox to see all the LUN's and controllers from just 1 ISCSI addition?

I basically setup some VM's and all was fine and dandy - until they started randomly corrupting and becoming beyond repair.

They where setup by picking a LVM as the storage, using virtio for disk + network and the actual ubuntu installation was done with guided entire disk on LVM.

One VM had more than 2500 inode issues when trying to repair it and ultimately it was so ruined the only option was to delete it and start over.

So there seems to be some form of misalignment of the data happening on that model.

Right now i have tried to start over entirely and setup the iSCSI targets to allow to use LUN's directly and i am installing with guided entire disk without LVM - so i removed 2 x LVM compared to before - and untill now the system hasnt corrupted yet.

So there is multiple questions here:

1) Is it possible to get Proxmox to only show 1 SAN and all the actual LUN's no matter which controller they are on.
2) Has anyone experienced these issues we have with corruption or ideas on why and what to do?

Just pasting my Multipath conf here for referencing (local discs sda and sdb blacklisted).

multipath.conf

Code:
defaults {
        polling_interval        2
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
        rr_min_io               100
        failback                immediate
        no_path_retry           queue
}


devices {
        device {
                vendor "COMPELNT"
                product "Compellent Vol"
                path_grouping_policy multibus
                getuid_callout "/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/%n"
                path_selector "round-robin 0"
                path_checker tur
                features "0"
                hardware_handler "0"
                prio const
                failback immediate
                rr_weight uniform
                no_path_retry queue
                rr_min_io 1000
                rr_min_io_rq 1
        }
}




blacklist {
        wwid 36c81f660e05508001a6cf31a13d4e399
        wwid 1Dell_Internal_Dual_SD_0123456789AB
}

From a CLI based point of view with multipath -v3, multipath -f, multipath -l etc. everything is looking good and correctly.

But still someone is behind the corruption thats happening so good ideas or experiences are welcome.

Ps. ill update later as to testing on direct LUN's with no LVM at all involved.
 
Hi rwron,

Did you get any further with this? We are experiencing a similar problem and would be good to know if (and how you resolved it).
Thanks in advance,
Chris.
 
proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
 
We to are running into issues with Proxmox4 and a Nimble CS300 iSCSI enclosure. Very similar by the sounds of it and the end result is filesystem corruption, but so far I haven't seen it to the degree you are describing.

I setup a lab for the Proxmox Dev's and they have been digging in for a week or two, but nothing has been pinned down yet. In my environment the issue is related to using cache=none and direct IO. What disk cache setting are you using for the VM? It would be fantastic if there is someone else seeing this issue as it seems to be quite a tough one.
 
Hi Adamb,

I'm afraid you're a little further on that me.

I'm still stuck trying to mount the storage. We have an Equallogic array that works fine. We just create a volume, point the iSCSI target at it, create LVM group on top and away we go.

The problem I have here is that we don't see any iSCSI volumes listed when trying to find targets on the Compellent SC4020. We just see the 4 physical controllers (and none of the fault domain targets). Of course I can connect to one of the 4 physical controllers and then create LVM group on top of that but this spits out lots of IO errors, like this:

Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482102] sd 28:0:0:1: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482103] sd 28:0:0:1: [sdd] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482104] sd 28:0:0:1: [sdd] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482106] sd 28:0:0:1: [sdd] CDB: Write(10) 2a 00 0c 45 50 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483086] sd 28:0:0:1: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483088] sd 28:0:0:1: [sdd] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483089] sd 28:0:0:1: [sdd] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483091] sd 28:0:0:1: [sdd] CDB: Write(10) 2a 00 0c 44 d0 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651663] sd 25:0:0:1: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651669] sd 25:0:0:1: [sdb] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651672] sd 25:0:0:1: [sdb] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651674] sd 25:0:0:1: [sdb] CDB: Write(10) 2a 00 0c 47 d0 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652254] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652659] sd 25:0:0:1: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652660] sd 25:0:0:1: [sdb] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652662] sd 25:0:0:1: [sdb] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652663] sd 25:0:0:1: [sdb] CDB: Write(10) 2a 00 0c 44 10 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653545] sd 25:0:0:1: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653547] sd 25:0:0:1: [sdb] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653548] sd 25:0:0:1: [sdb] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653549] sd 25:0:0:1: [sdb] CDB: Write(10) 2a 00 0c 47 10 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.669788] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.909463] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1014.061686] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1014.203513] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1014.388306] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.425995] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.501464] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.537361] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.774192] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.993548] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1015.258274] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:14 ess-prox-023 kernel: [ 1015.478397] device-mapper: multipath: Failing path 8:48.

I'm assuming because the iSCSI initator is only using one path to the SAN.

I'm trying various fault domain setups at present so I'll report back any successes but would be good to hear from anyone who has experience with an SC4020.

Thanks
Chris.
 
Hi Adamb,

I'm afraid you're a little further on that me.

I'm still stuck trying to mount the storage. We have an Equallogic array that works fine. We just create a volume, point the iSCSI target at it, create LVM group on top and away we go.

The problem I have here is that we don't see any iSCSI volumes listed when trying to find targets on the Compellent SC4020. We just see the 4 physical controllers (and none of the fault domain targets). Of course I can connect to one of the 4 physical controllers and then create LVM group on top of that but this spits out lots of IO errors, like this:

Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482102] sd 28:0:0:1: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482103] sd 28:0:0:1: [sdd] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482104] sd 28:0:0:1: [sdd] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.482106] sd 28:0:0:1: [sdd] CDB: Write(10) 2a 00 0c 45 50 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483086] sd 28:0:0:1: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483088] sd 28:0:0:1: [sdd] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483089] sd 28:0:0:1: [sdd] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.483091] sd 28:0:0:1: [sdd] CDB: Write(10) 2a 00 0c 44 d0 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651663] sd 25:0:0:1: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651669] sd 25:0:0:1: [sdb] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651672] sd 25:0:0:1: [sdb] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.651674] sd 25:0:0:1: [sdb] CDB: Write(10) 2a 00 0c 47 d0 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652254] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652659] sd 25:0:0:1: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652660] sd 25:0:0:1: [sdb] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652662] sd 25:0:0:1: [sdb] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.652663] sd 25:0:0:1: [sdb] CDB: Write(10) 2a 00 0c 44 10 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653545] sd 25:0:0:1: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653547] sd 25:0:0:1: [sdb] Sense Key : Aborted Command [current]
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653548] sd 25:0:0:1: [sdb] Add. Sense: Synchronous data transfer error
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.653549] sd 25:0:0:1: [sdb] CDB: Write(10) 2a 00 0c 47 10 00 00 40 00 00
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.669788] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1013.909463] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1014.061686] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1014.203513] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:12 ess-prox-023 kernel: [ 1014.388306] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.425995] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.501464] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.537361] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.774192] device-mapper: multipath: Failing path 8:32.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1014.993548] device-mapper: multipath: Failing path 8:16.
Dec 15 18:05:13 ess-prox-023 kernel: [ 1015.258274] device-mapper: multipath: Failing path 8:48.
Dec 15 18:05:14 ess-prox-023 kernel: [ 1015.478397] device-mapper: multipath: Failing path 8:48.

I'm assuming because the iSCSI initator is only using one path to the SAN.

I'm trying various fault domain setups at present so I'll report back any successes but would be good to hear from anyone who has experience with an SC4020.

Thanks
Chris.

Let me make sure I am on the same page. So you can't see the target from the host? I guess I am confused as you were talking about multipath and that would come after you can see the target and login. Or are you trying to present a iscsi partition to a VM which resides on local storage and it can't see the target?
 
I can see the targets from the host, but only the 4 physical ports show up, not the actual volume target which is what we get with our Equallogic units.
 
I can see the targets from the host, but only the 4 physical ports show up, not the actual volume target which is what we get with our Equallogic units.

Im not familiar with Equallogic but they all function the same for the most part. Are you allowing all initiators to login, or do you limit login based on the iqn of the host? I know on the nimble, if I don't allow the specific iqn, I run into a very similar issue.

Have you used any other iscsi initiators to use a lun from the Equallogic? Maybe a Proxmox 3.4 host? Just trying to help narrow down if its the host or the storage.
 
Hey,

Sorry, we have our wires crossed. My Equallogic SAN connections are working fine with Proxmox, all the way back to version 1.6 :)

We have just bought a new Compellent SC4020 enclosure and this is causing us the problems.
c:)
 
Hey,

Sorry, we have our wires crossed. My Equallogic SAN connections are working fine with Proxmox, all the way back to version 1.6 :)

We have just bought a new Compellent SC4020 enclosure and this is causing us the problems.
c:)

But have you tried a different host just to prove that it is indeed a proxmox 4 issue?
 
Dell support have actually been very good. I currently have a ticket open with them and they willing to help me whilst using Prox/Debian instead of RHEL etc.
I shall report back.
 
Dell support have actually been very good. I currently have a ticket open with them and they willing to help me whilst using Prox/Debian instead of RHEL etc.
I shall report back.

Yea HP and IBM are the same way until the issue gets passed level 3 and engineering starts digging in. Once they get their hands on it, they want a supported OS. Good luck, hope you get it pinned down!
 
Hi,

I have found that the Compellent seems to work fine with Proxmox 3.4 and this would appear to be down to a driver mismatch between 3.4 and 4.x for my Qlogic HBA.

This is the driver from Prox 3.4:
bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.710.53r

And this if from Prox 4.x:
bnx2x: Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.710.51-0

These are identical cards but they are read slightly differently on 3.4 and 4.x.

I'd really like to get this working on 4.x so can anyone suggest a method to transfer the 3.4 driver over to 4.x?

In the meantime things are working very nicely in 3.4 so I'll stick with that for now but these are production machines and I don't want them pinned to that version forever.

Thanks,
Chris.
 
Hello Chris,
Any news to make Dell Compellent works with Proxmox4 ?
I have the same setup and don't want to try to upgrade with the driver problem.

Cheers,
 
Hello Chris,
Any news to make Dell Compellent works with Proxmox4 ?
I have the same setup and don't want to try to upgrade with the driver problem.

Cheers,

Hi RCK,

I decided to swap out all the QLOGIC 10Gb cards for Intel X520 DP's. The system now works very well. I would steer clear of the QLOGIC cards for Linux due to limited driver support for newer Linux kernels.

Cheers,
c:)
 
Thanks for the quick reply !
Hum, I have some Intel 10G and some Broadcom 10G card, I will try proxmox 4 on one node to see if it still work.
Your main problem what iscsi multipath could no more see the SAN ?
I mean, if iscsi is seeing correctly the SAN volume, I don't have to fear some later corruption ?

Thanks :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!