omnios iscsi issue after update

RobFantini

Famous Member
May 24, 2012
2,041
107
133
Boston,Mass
we updated a napp-it/omnios system , after that kvm will not start:
Code:
qm start 8110
kvm: -drive file=iscsi://10.2.2.41/iqn.2010-09.org.napp-it:1459891666/2,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on: iSCSI: Failed to connect to LUN : iscsi_service failed with : iscsi_service_reconnect_if_loggedin. Can not reconnect right now.

anyone a suggestion to fix the issue?
 
at comstar page:
Comstar stmf online 12:17:14 svc:/system/stmf:default
iscsi/target online 12:17:14 svc:/network/iscsi/target:default
 
I got this reply at omnios-discuss <omnios-discuss@lists.omniti.com>

"we found that after the upgrade a few of our luns had disaperaed
... since onone else reported this on the list we thought that we
may have made a mistake ... but maybe there is something more

just recreate the missing luns ... (we had to pick new lun numbers
since omni complained that the old luns were already in use)

then rescan iscsi on the proxmox host and all should be well."


Question - does 'Zfs over iSCSI' storage use luns ?
 
Yes.
See my answer in same thread. You might just have lost the views to the lun(s)

Also the kvm's that are not operational are backup systems. We can live with out those for a few days.

once the solution is found I'll put the info here and at the mail list.
 
at omnios the iscsi target is off line:

Code:
# itadm list-target
TARGET NAME  STATE  SESSIONS
iqn.2010-09.org.napp-it:1459891666  offline  235

I've been reading manuals and searching but have not seen yet how to put it on line.

any clues please are welcome!
 
Still offline:
Code:
# svcadm restart stmf

#  svcs |grep stmf 
online*        12:23:45 svc:/system/stmf:default

# itadm list-target
TARGET NAME                                                  STATE    SESSIONS 
iqn.2010-09.org.napp-it:1459891666                           offline  235
 
I just noticed a scrub in progress:
Code:
# zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub in progress since Mon Nov  7 23:00:01 2016
    1.66T scanned out of 3.36T at 36.0M/s, 13h45m to go
    0 repaired, 49.48% done

I think that is going slow.
However that should not make the target offline ?
 
the system just crashed/panicked and rebooted.


after reboot the target is on line:
Code:
# itadm list-target
TARGET NAME                                                  STATE    SESSIONS
iqn.2010-09.org.napp-it:1459891666                           online   4

luckily this is a backup system.

I will not be doing an update to our production system any time soon. need to find out what caused the issue.
 
I think we have a network switch issue or nic .
on our production system this showed on reboot 4 times in a row:

this is pasted from https://docs.oracle.com/cd/E19417-01/html/E20814/z40004961296336.html

Code:
SYS1 had this on reboot 11/8/2016

On-Board Ethernet Devices Fail to Connect After a Faulty CPU Reconfigures Back to the Host (CR 6984323)

When rebooting the server after a failed or disabled CPU reconfigures back to the host, the onboard Gigabit Ethernet connections will not connect to network. The following example messages will display on the system console:

igb0: DL_ATTACH_REQ failed: DL_SYSERR (errno 22)
igb0: DL_BIND_REQ failed: DL_OUTSTATE
igb0: DL_PHYS_ADDR_REQ failed: DL_OUTSTATE
igb0: DL_UNBIND_REQ failed: DL_OUTSTATE
Failed to plumb IPv4 interface(s): igb0

Workaround:

Reboot the server two additional times. If the problem persists, contact your service representative for assistance.

So the issue is hardware . we replicate between our to omnios systems. those were working fine , some every 5 minutes.
any kvm's that write a lot to storage had issues.
maybe there is a way to pirioitize disk traffic ?
phone system had no issue running on iscsi storage.

Anyways I'm moving systems to pve zfs and will check the hardware issue.
 
It turns out the hardware error was for a /dev device that is not used. like when disks are moved on linux and udev file needs to be deleted. There was a complaint about /dev/ixgbe1 which is not listed in ifconfig . we use ixgbe2 and ixgbe3 /



So we have a iscsi/pve issue after the update. the cause could still be hardware.

I've got to try to debug the issue to find the cause .
Any debugging suggestions are welcome
 
It turns out the hardware error was for a /dev device that is not used. like when disks are moved on linux and udev file needs to be deleted. There was a complaint about /dev/ixgbe1 which is not listed in ifconfig . we use ixgbe2 and ixgbe3 /



So we have a iscsi/pve issue after the update. the cause could still be hardware.

I've got to try to debug the issue to find the cause .
Any debugging suggestions are welcome


Will run napp-it author benchmark test suggestions. I have some from pre update on one of the systems.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!