Storage I/O problem during upgrade

gob

Renowned Member
Aug 4, 2011
69
2
73
Chesterfield, United Kingdom
Hi

I am just in the middle of upgrading the first node in my cluster from version 4.4 to 5.
Towards the end of the upgrade I started getting the following message which keep scrolling round and around:
Code:
File descriptor 3 (pipe:[1941289]) leaked on vgs invocation. Parent PID 31851: grub-install
  WARNING: PV z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D on /dev/sdc was already found on /dev/sdb.
  /dev/sdd: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 3748092903424: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 3748092960768: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sde: read failed after 0 of 4096 at 0: Input/output error
  /dev/sde: read failed after 0 of 4096 at 3748092903424: Input/output error
  /dev/sde: read failed after 0 of 4096 at 3748092960768: Input/output error
  /dev/sde: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 2398353293312: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 2398353350656: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 2398353293312: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 2398353350656: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 4096: Input/output error
  WARNING: PV MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE on /dev/sdi was already found on /dev/sdh.
  WARNING: PV z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D prefers device /dev/sdc because device is used by LV.
  WARNING: PV MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE prefers device /dev/sdh because device was seen first.
File descriptor 3 (pipe:[1941289]) leaked on vgs invocation. Parent PID 31851: grub-install
  WARNING: PV z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D on /dev/sdc was already found on /dev/sdb.
  /dev/sdd: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 3748092903424: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 3748092960768: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sde: read failed after 0 of 4096 at 0: Input/output error
  /dev/sde: read failed after 0 of 4096 at 3748092903424: Input/output error
  /dev/sde: read failed after 0 of 4096 at 3748092960768: Input/output error
  /dev/sde: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 2398353293312: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 2398353350656: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 2398353293312: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 2398353350656: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 4096: Input/output error
  WARNING: PV MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE on /dev/sdi was already found on /dev/sdh.
  WARNING: PV z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D prefers device /dev/sdc because device is used by LV.
  WARNING: PV MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE prefers device /dev/sdh because device was seen first.
File descriptor 3 (pipe:[1941289]) leaked on vgs invocation. Parent PID 31851: grub-install
  WARNING: PV z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D on /dev/sdc was already found on /dev/sdb.
  /dev/sdd: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 3748092903424: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 3748092960768: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sde: read failed after 0 of 4096 at 0: Input/output error
  /dev/sde: read failed after 0 of 4096 at 3748092903424: Input/output error
  /dev/sde: read failed after 0 of 4096 at 3748092960768: Input/output error
  /dev/sde: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 2398353293312: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 2398353350656: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 4096: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 2398353293312: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 2398353350656: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 4096: Input/output error
  WARNING: PV MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE on /dev/sdi was already found on /dev/sdh.
  WARNING: PV z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D prefers device /dev/sdc because device is used by LV.
  WARNING: PV MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE prefers device /dev/sdh because device was seen first.

The upgrade has not finished yet after around 30 minutes and I wonder what options I have?
Should I terminate and reboot?
Any idea why this node would have the I/O issue when the storage is still accessible from other nodes?

Thanks for any suggestions.
 
> Any idea why this node would have the I/O issue when the storage is still accessible from other nodes?

I suppose these device files ( /dev/sd*) refer to ISCSI luns.
Propabably a network hiccup between your PVE host and the ISCSI target.

Check that you're still logged in your ISCSI session with:
iscsiadm -m session -P 1

if yes check that you can read the first 512 bytes of the ISCSI disks with
dd if=/dev/sdX of=/dev/null count=1
 
Thanks for your suggestions Manu. They are indeed all iSCSI LUNS

It appears that all iSCSI sessions are logged in:
Code:
iscsiadm -m session -P 1
Target: iqn.2005-10.org.freenas.ctl:proxiscsi (non-flash)
        Current Portal: 10.4.132.60:3260,257
        Persistent Portal: 10.4.132.60:3260,257
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:01:3e1f812bb5d5
                Iface IPaddress: 10.4.132.110
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
Target: iqn.1984-05.com.dell:powervault.6001ec9000df04ae00000000486fb13d (non-flash)
        Current Portal: 10.4.131.61:3260,2
        Persistent Portal: 10.4.131.61:3260,2
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:01:3e1f812bb5d5
                Iface IPaddress: 10.4.131.110
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
        Current Portal: 10.4.130.61:3260,2
        Persistent Portal: 10.4.130.61:3260,2
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:01:3e1f812bb5d5
                Iface IPaddress: 10.4.130.110
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 3
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
        Current Portal: 10.4.130.60:3260,1
        Persistent Portal: 10.4.130.60:3260,1
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:01:3e1f812bb5d5
                Iface IPaddress: 10.4.130.110
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 4
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
        Current Portal: 10.4.131.60:3260,1
        Persistent Portal: 10.4.131.60:3260,1
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:01:3e1f812bb5d5
                Iface IPaddress: 10.4.131.110
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 5
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE

If I run lsblk -f I get the following:

Code:
NAME                    FSTYPE      LABEL UUID                                   MOUNTPOINT
loop0                   ext4              68bc6f99-8c59-4727-a2c5-d4f1c3c4cd12  
loop1                   ext4              d7147675-c8cf-4b04-8da7-53d595ef4006  
loop2                   ext4              71e28686-f649-45a7-aba9-dacf29f28f00  
loop3                   ext4              03e63b00-3f8f-4a67-b462-6862e0a25493  
sda                                                                            
├─sda1                                                                          
├─sda2                  vfat              B470-9C03                            
└─sda3                  LVM2_member       8aI1Pu-JZDF-CmBW-67GX-redR-eyUq-XIrcXn
  ├─pve-swap            swap              e6e920bc-37b9-4bcd-b675-b2be347c73a1   [SWAP]
  ├─pve-root            ext4              90bd4685-7ab2-42b3-91a5-46d041937641   /
  ├─pve-data_tmeta                                                              
  │ └─pve-data                                                                  
  └─pve-data_tdata                                                              
    └─pve-data                                                                  
sdb                     LVM2_member       BUFpcO-Oehg-hGHK-z1L1-1Guk-UseM-jOz7Ea
├─SSD1-vm--350--disk--1                                                        
├─SSD1-vm--360--disk--1 ext4              3e30a101-a9e3-4f5e-941d-867a2ed4f761  
├─SSD1-vm--367--disk--1 ext4              a577d35a-8f06-44b1-8d36-43737281713e  
└─SSD1-vm--352--disk--1                                                        
sdc                                                                            
sdd                     LVM2_member       z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D
sde                     LVM2_member       z3i0XJ-QUB7-yTOn-cuLa-7qEh-qTCz-L1pQ3D
sdf                                                                            
sdg                     LVM2_member       MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE
sdh                                                                            
sdi                     LVM2_member       MuWMn4-o1IS-wQ8d-7Elj-pNwA-3hHh-p28KeE
sdj

running the dd test give a success for most drives apart from those with blank lines:

Code:
root@ms-200-prox05:~# dd if=/dev/sdd of=/dev/null count=1
1+0 records in
1+0 records out
512 bytes copied, 0.00650636 s, 78.7 kB/s

Code:
root@ms-200-prox05:~# dd if=/dev/sdf of=/dev/null count=1
dd: error reading '/dev/sdf': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 1.03039 s, 0.0 kB/s

I can browse the iSCSI datastores from my other nodes without problem. The contents list appears in < 2s.
Although I can still browse the contents of the iSCSI luns from this host it can take > 30s to list the contents.
As you suggest I think it is possibly a network problem between this host and the iSCSI storage devices so I plan to give it all a kick later tonight.
 
File descriptor 3 (pipe:[1941289]) leaked on vgs invocation. Parent PID 31851: grub-install
Please, check before hand that grub-install was successful on /dev/sda, otherwise you will have a dead host. ;)
 
I'm afraid a full system shutdown didn't help with this one node (host, iSCSI SAN and switches all rebooted).
My logs are filling up with this:
Code:
Nov 14 18:54:55 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:54:55 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:54:55 ms-200-prox05 kernel: print_req_error: I/O error, dev sdj, sector 0
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 CDB: Read(16) 88 00 00 00 00 01 b4 55 df 80 00 00 00 08 00 00
Nov 14 18:54:56 ms-200-prox05 kernel: print_req_error: I/O error, dev sdj, sector 7320493952
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:54:56 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 CDB: Read(16) 88 00 00 00 00 01 b4 55 df f0 00 00 00 08 00 00
Nov 14 18:54:56 ms-200-prox05 kernel: print_req_error: I/O error, dev sdj, sector 7320494064
Nov 14 18:54:57 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:54:57 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:54:57 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:54:57 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:54:57 ms-200-prox05 kernel: print_req_error: I/O error, dev sdj, sector 0
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
Nov 14 18:54:58 ms-200-prox05 kernel: print_req_error: I/O error, dev sdj, sector 8
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:54:58 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:54:58 ms-200-prox05 kernel: print_req_error: I/O error, dev sdj, sector 0
Nov 14 18:54:58 ms-200-prox05 pvestatd[1692]: status update time (12.943 seconds)
Nov 14 18:54:59 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:54:59 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:54:59 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:54:59 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:54:59 ms-200-prox05 kernel: print_req_error: I/O error, dev sdd, sector 0
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 01 17 34 7f 80 00 00 00 08 00 00
Nov 14 18:55:00 ms-200-prox05 kernel: print_req_error: I/O error, dev sdd, sector 4684283776
Nov 14 18:55:00 ms-200-prox05 systemd[1]: Starting Proxmox VE replication runner...
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:00 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 01 17 34 7f f0 00 00 00 08 00 00
Nov 14 18:55:00 ms-200-prox05 kernel: print_req_error: I/O error, dev sdd, sector 4684283888
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:01 ms-200-prox05 kernel: print_req_error: I/O error, dev sdd, sector 0
Nov 14 18:55:01 ms-200-prox05 systemd[1]: Started Proxmox VE replication runner.
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:01 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
Nov 14 18:55:01 ms-200-prox05 kernel: print_req_error: I/O error, dev sdd, sector 8
Nov 14 18:55:02 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:02 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:02 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:02 ms-200-prox05 kernel: sd 3:0:0:1: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:02 ms-200-prox05 kernel: print_req_error: I/O error, dev sdd, sector 0
Nov 14 18:55:02 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:02 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:02 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:02 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:02 ms-200-prox05 kernel: print_req_error: I/O error, dev sdg, sector 0
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 01 17 34 7f 80 00 00 00 08 00 00
Nov 14 18:55:03 ms-200-prox05 kernel: print_req_error: I/O error, dev sdg, sector 4684283776
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:03 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 01 17 34 7f f0 00 00 00 08 00 00
Nov 14 18:55:03 ms-200-prox05 kernel: print_req_error: I/O error, dev sdg, sector 4684283888
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:04 ms-200-prox05 kernel: print_req_error: I/O error, dev sdg, sector 0
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:04 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
Nov 14 18:55:04 ms-200-prox05 kernel: print_req_error: I/O error, dev sdg, sector 8
Nov 14 18:55:05 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:05 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:05 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:05 ms-200-prox05 kernel: sd 4:0:0:1: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:05 ms-200-prox05 kernel: print_req_error: I/O error, dev sdg, sector 0
Nov 14 18:55:05 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:05 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:05 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:05 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:05 ms-200-prox05 kernel: print_req_error: I/O error, dev sdi, sector 0
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 01 b4 55 df 80 00 00 00 08 00 00
Nov 14 18:55:06 ms-200-prox05 kernel: print_req_error: I/O error, dev sdi, sector 7320493952
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:06 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 01 b4 55 df f0 00 00 00 08 00 00
Nov 14 18:55:06 ms-200-prox05 kernel: print_req_error: I/O error, dev sdi, sector 7320494064
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:07 ms-200-prox05 kernel: print_req_error: I/O error, dev sdi, sector 0
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:07 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
Nov 14 18:55:07 ms-200-prox05 kernel: print_req_error: I/O error, dev sdi, sector 8
Nov 14 18:55:08 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:08 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:08 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1
Nov 14 18:55:08 ms-200-prox05 kernel: sd 6:0:0:2: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 14 18:55:08 ms-200-prox05 kernel: print_req_error: I/O error, dev sdi, sector 0
Nov 14 18:55:09 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 14 18:55:09 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 Sense Key : Illegal Request [current]
Nov 14 18:55:09 ms-200-prox05 kernel: sd 5:0:0:2: [sdj] tag#0 <<vendor>>ASC=0x94 ASCQ=0x1

Thoughts anyone?

thanks
 
What does the other side (SAN box) say?
 
What does the other side (SAN box) say?

I have 3 SANs on my storage networks.
  • 1 x DELL MD3000i (1 x SAS LUN and 1 x SATAu LUN) 1gb network
  • 1 x FreeNAS (24 x SSD array) 10Gbe network
  • 1 x FreeNAS (SATA array) 1gb network
Whenever I attach any of those to my 3rd host via iSCSI I get the constant stream of I/O errors as in my above post, and although they do connect, browsing the storage is slow.
They are on different NICs in the host and routed through different switches. If I ping the storage from the host I get a constant low latency connection around 0.1ms
If I connect to an NFS volume on the FreeNAS units they connect without issue.

Migrating the VM's to a different host but keeping on the same storage is fine. They all migrate OK and fire up without issue.
So it is definitely down to an iSCSI issue but I believe the networking is fine.

The only other thing is that my two good nodes are DELL servers and the problematic node is HP (I know that shouldn't matter).
 
Here is one of the FreeNAS boxes:
Code:
Nov 15 14:50:20 ms-200-fn01 ctld[75708]: 10.4.132.110: read: connection lost
Nov 15 14:50:20 ms-200-fn01 ctld[2313]: child process 75708 terminated with exit status 1
Nov 15 14:50:20 ms-200-fn01 ctld[75709]: 10.4.132.110: read: connection lost
Nov 15 14:50:20 ms-200-fn01 ctld[2313]: child process 75709 terminated with exit status 1
Nov 15 14:50:21 ms-200-fn01 ctld[75712]: 10.4.132.100: read: connection lost
Nov 15 14:50:21 ms-200-fn01 ctld[2313]: child process 75712 terminated with exit status 1
Nov 15 14:50:21 ms-200-fn01 ctld[75717]: 10.4.132.90: read: connection lost
Nov 15 14:50:21 ms-200-fn01 ctld[2313]: child process 75717 terminated with exit status 1
Nov 15 14:50:23 ms-200-fn01 ctld[75722]: 10.4.132.110: read: connection lost
Nov 15 14:50:23 ms-200-fn01 ctld[2313]: child process 75722 terminated with exit status 1

Hmmm... this is showing that all 3 nodes are dropping connections, but only the one node is struggling.
 
This sounds to me, either the links are saturated or something with multipath. Also some iSCSI timeouts might have changed between debian 8 -> 9 and this might make the situation worse. But so far, all is guessing.
 
OK, rather embarrassingly I (you) have discovered the problem.
On this host it appears I did not configure iSCSI timeouts or install the Multipath tools!

Since installing and copying my multipath.conf settings the node is behaving itself and I can browse the storage instantly.
I have however noticed there is still a recurring error in my logs:
Code:
Nov 17 10:53:27 ms-200-prox05 kernel: sd 6:0:0:0: alua: supports implicit TPGS
Nov 17 10:53:27 ms-200-prox05 kernel: sd 6:0:0:0: alua: device t10.FreeNAS iSCSI Disk      a0369f9adbf800                  port group 1 rel port 3
Nov 17 10:53:27 ms-200-prox05 kernel: sd 6:0:0:0: alua: port group 01 state A non-preferred supports TolUSNA

This error only refers to my 10Gb FreeNAS SAN which only has a single NIC so Multipath would not work on this SAN anyway, yet multipath shows this:
Code:
:~# multipath -ll
mpathd (36589cfc0000005aa6ac5bd977373e544) dm-7 FreeNAS,iSCSI Disk
size=2.5T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 5:0:0:0 sdk 8:160 active ready running
  `- 6:0:0:0 sdj 8:144 active ready running
mpathc (36001ec9000def617000009e555b1ae89) dm-6 DELL,MD3000i
size=3.4T features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| |- 1:0:0:2 sdf 8:80  active ready running
| `- 2:0:0:2 sdg 8:96  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  |- 4:0:0:2 sdh 8:112 active ghost running
  `- 3:0:0:2 sdi 8:128 active ghost running
mpathb (36001ec9000df04ae00000dba55b19f20) dm-5 DELL,MD3000i
size=2.2T features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| |- 4:0:0:1 sdc 8:32  active ready running
| `- 3:0:0:1 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  |- 1:0:0:1 sdb 8:16  active ghost running
  `- 2:0:0:1 sdd 8:48  active ghost running

Should I exclude it from Multipath somehow?

Cheers!
 
To me it doesn't look like those messages are errors, rather informational. But to stop them, I guess the exclusion might be the way to go.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!