omnios iscsi issue after update

RobFantini · Nov 7, 2016

we updated a napp-it/omnios system , after that kvm will not start:

Code:

qm start 8110
kvm: -drive file=iscsi://10.2.2.41/iqn.2010-09.org.napp-it:1459891666/2,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on: iSCSI: Failed to connect to LUN : iscsi_service failed with : iscsi_service_reconnect_if_loggedin. Can not reconnect right now.

anyone a suggestion to fix the issue?

mir · Nov 7, 2016

What does server overview on the frontpage say about:
comstar service : online
comstar target : online

RobFantini · Nov 7, 2016

comstar service: online
comstar iscsi : online

I did not see comstar target :

RobFantini · Nov 7, 2016

at comstar page:
Comstar stmf online 12:17:14 svc:/system/stmf:default
iscsi/target online 12:17:14 svc:/network/iscsi/target:default

mir · Nov 7, 2016

If that is the case it is impossible to give any help without any further information.

RobFantini · Nov 7, 2016

mir said:
If that is the case it is impossible to give any help without any further information.

Thanks for the help so far. I'll dig further after some sleep.

RobFantini · Nov 8, 2016

I got this reply at omnios-discuss <omnios-discuss@lists.omniti.com>

"we found that after the upgrade a few of our luns had disaperaed
... since onone else reported this on the list we thought that we
may have made a mistake ... but maybe there is something more

just recreate the missing luns ... (we had to pick new lun numbers
since omni complained that the old luns were already in use)

then rescan iscsi on the proxmox host and all should be well."

Question - does 'Zfs over iSCSI' storage use luns ?

mir · Nov 8, 2016

RobFantini said:
does 'Zfs over iSCSI' storage use luns ?

Yes.
See my answer in same thread. You might just have lost the views to the lun(s)

RobFantini · Nov 8, 2016

mir said:
Yes.
See my answer in same thread. You might just have lost the views to the lun(s)

Also the kvm's that are not operational are backup systems. We can live with out those for a few days.

once the solution is found I'll put the info here and at the mail list.

RobFantini · Nov 8, 2016

at omnios the iscsi target is off line:

Code:

# itadm list-target
TARGET NAME  STATE  SESSIONS
iqn.2010-09.org.napp-it:1459891666  offline  235

I've been reading manuals and searching but have not seen yet how to put it on line.

any clues please are welcome!

RobFantini · Nov 8, 2016

Code:

# stmfadm online-target iqn.2010-09.org.napp-it:1459891666
stmfadm: resource busy

mir · Nov 8, 2016

Either
svcadm enable stmf
or
svcadm restart stmf

See status:
svcs |grep stmf

RobFantini · Nov 8, 2016

Still offline:

Code:

# svcadm restart stmf

#  svcs |grep stmf 
online*        12:23:45 svc:/system/stmf:default

# itadm list-target
TARGET NAME                                                  STATE    SESSIONS 
iqn.2010-09.org.napp-it:1459891666                           offline  235

RobFantini · Nov 8, 2016

I just noticed a scrub in progress:

Code:

# zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub in progress since Mon Nov  7 23:00:01 2016
    1.66T scanned out of 3.36T at 36.0M/s, 13h45m to go
    0 repaired, 49.48% done

I think that is going slow.
However that should not make the target offline ?

mir · Nov 8, 2016

RobFantini said:
Still offline:

Code:

# svcs |grep stmf online* 12:23:45 svc:/system/stmf:default

The * means that stmf is in some kind of error state. Try svcadm clear stmf and watch for output from the command and also svcs.

RobFantini · Nov 8, 2016

the system just crashed/panicked and rebooted.

after reboot the target is on line:

Code:

# itadm list-target
TARGET NAME                                                  STATE    SESSIONS
iqn.2010-09.org.napp-it:1459891666                           online   4

luckily this is a backup system.

I will not be doing an update to our production system any time soon. need to find out what caused the issue.

mir · Nov 8, 2016

RobFantini said:
the system just crashed/panicked and rebooted.

How is your pool and disk health?

RobFantini · Nov 9, 2016

I think we have a network switch issue or nic .
on our production system this showed on reboot 4 times in a row:

this is pasted from https://docs.oracle.com/cd/E19417-01/html/E20814/z40004961296336.html

Code:

SYS1 had this on reboot 11/8/2016

On-Board Ethernet Devices Fail to Connect After a Faulty CPU Reconfigures Back to the Host (CR 6984323)

When rebooting the server after a failed or disabled CPU reconfigures back to the host, the onboard Gigabit Ethernet connections will not connect to network. The following example messages will display on the system console:

igb0: DL_ATTACH_REQ failed: DL_SYSERR (errno 22)
igb0: DL_BIND_REQ failed: DL_OUTSTATE
igb0: DL_PHYS_ADDR_REQ failed: DL_OUTSTATE
igb0: DL_UNBIND_REQ failed: DL_OUTSTATE
Failed to plumb IPv4 interface(s): igb0

Workaround:

Reboot the server two additional times. If the problem persists, contact your service representative for assistance.

So the issue is hardware . we replicate between our to omnios systems. those were working fine , some every 5 minutes.
any kvm's that write a lot to storage had issues.
maybe there is a way to pirioitize disk traffic ?
phone system had no issue running on iscsi storage.

Anyways I'm moving systems to pve zfs and will check the hardware issue.

RobFantini · Nov 10, 2016

It turns out the hardware error was for a /dev device that is not used. like when disks are moved on linux and udev file needs to be deleted. There was a complaint about /dev/ixgbe1 which is not listed in ifconfig . we use ixgbe2 and ixgbe3 /

So we have a iscsi/pve issue after the update. the cause could still be hardware.

I've got to try to debug the issue to find the cause .
Any debugging suggestions are welcome

RobFantini · Nov 10, 2016

RobFantini said:
It turns out the hardware error was for a /dev device that is not used. like when disks are moved on linux and udev file needs to be deleted. There was a complaint about /dev/ixgbe1 which is not listed in ifconfig . we use ixgbe2 and ixgbe3 /

So we have a iscsi/pve issue after the update. the cause could still be hardware.

I've got to try to debug the issue to find the cause .
Any debugging suggestions are welcome

Will run napp-it author benchmark test suggestions. I have some from pre update on one of the systems.

omnios iscsi issue after update

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member