New Kernel PVE-5.15.30-2 breaks iscsi connections

sysadminfromhell · May 6, 2022

Hey,

so because in the other thread no one was answering and basically ignoring this fact I have to reopen a new one.
So: with the new Kernel PVE-5.15.30 my iSCSI Connections breaks and cant be revived. only the with the "old" 5.13.19-6 kernel my iSCSI connections are working.
I have currently 3 Intel Nucs running in a HA Cluster with 1 GB Nics.
I´m not using multipath (is this the problem?).

Kind regards,

bbgeek17 · May 6, 2022

Can you point to the other thread ? Does it have more information than "3 nucs and 1gb"? I'd be very interested in broken iSCSI, we have not seen this in our testing.
Thank you.

sysadminfromhell · May 7, 2022

what kind of information do you need?

I’ve got 1x Intel Nuc 7th Gen NUC7i5DNHE
2x Intel Nuc 10th Gen NUC10i7FNHN
They all 3 got the same Intel I219-V Ethernet Port.

egoistdream · May 7, 2022

HI,

I also have issue with HPE Smart Array P410 If I using the latest kernel 5.15.35-1-pve, with 5.13.19-6-pve works without no issue.

May 07 03:00:51kernel: DMAR: [DMA Read NO_PASID] Request device [01:00.2] fault addr 0xf363e000 [fault reason 0x06] PTE Read access is not set
Failed to start Ceph object storage daemon, all the ssd are seen but they are unaccesible and the ceph can't start and the logs are full with the kernel fault

bbgeek17 · May 7, 2022

sysadminfromhell said:
what kind of information do you need?

iSCSI is a Client/Server protocol. A single session is established between one client and one server. If the server's iSCSI implementation is any decent, it shouldn't matter how many clients you have. If more than one client breaks the server, then it's generally not the client's (PVE) fault.

When reporting an issue its very helpful to provide at the very least:
a) Client OS, version
b) Server OS, version
c) Exact detailed report of the problem
d) A repro scenario, if possible, is invaluable
e) Since PVE is involved: context of storage.cfg, VM config output, status of the storage
f) All logs before, during, after failure from both client and server.

I am sure I left something off..

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

sysadminfromhell · May 7, 2022

A) Client OS (iSCSI Initiiator): Debian 11.3
b) Server OS (iSCSI Target): QNAP 5.0.0.1986
c) when i boot with the kernel 5.15.30-2 the iScsi is broken and the PVE cannot access any of the LUNS / Storages
d) swtich to the kernel and reboot
e)Storage.cfg: cat /etc/pve/storage.cfg

Code:

iscsi: HomeNAS
        portal 172.16.24.199
        target iqn.2004-04.com.qnap:ts-251plus:iscsi.lun.147903
        content none

lvm: LUN0
        vgname LUN0
        base HomeNAS:0.0.0.scsi-SQNAP_iSCSI_Storage_9f126906-457d-4b36-b7bb-31e38f758f66
        content rootdir,images
        shared 1

nfs: BackupNAS
        export /BackupNAS
        path /mnt/pve/BackupNAS
        server 172.16.24.199
        content iso,backup,vztmpl
        options vers=4.2
        prune-backups keep-last=3

pbs: pbs
        datastore BackupStore
        server pbs.fritz.box
        content backup
        fingerprint d7:c3:0a:9f:7b:e4:ed:de:5c:bf:f9:62:52:92:35:b0:3c:86:4b:9e:d0:5d:41:18:ec:f4:4d:8d:e1:89:d4:21
        prune-backups keep-all=1
        username root@pam

dir: local
        disable
        path /var/lib/vz
        content snippets
        prune-backups keep-all=1
        shared 0

f Client) Best way is to attach the boot log I guess
f Server) The iSCSI Target says "logged in" no error here

---- EDIT ----
I can check the status in the CLi with "LSSCI" or "iscsiadm -m session -P 1" both tell me that the connection is established and logged in but in PVE I cannot use any of them (see screenshot):

Code:

iscsiadm -m session -P 1
Target: iqn.2004-04.com.qnap:ts-251plus:iscsi.lun.147903 (non-flash)
        Current Portal: 172.16.24.199:3260,1
        Persistent Portal: 172.16.24.199:3260,1
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:01:cebf5ae42a5d
                Iface IPaddress: 172.16.24.100
                Iface HWaddress: default
                Iface Netdev: default
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                
                
lsscsi
[1:0:0:0]    disk    QNAP     iSCSI Storage    4.0   /dev/sda
[N:0:4:1]    disk    Samsung SSD 970 EVO Plus 250GB__1          /dev/nvme0n1

Neobin · May 7, 2022

As you are also using a NFS share on this QNAP, it seems more like this problem [1].

So update your host to get the pve-5.15.35 kernel [2] (if you are on the no-subscription repo; not sure if it is already in the enterprise repo), boot with the new kernel and see if your problem is gone.

[1] https://forum.proxmox.com/threads/issue-after-upgrade-to-7-2-3.109003/#post-468458
[2] https://forum.proxmox.com/threads/proxmox-ve-7-2-released.108970/page-2#post-468872

sysadminfromhell · May 7, 2022

Neobin said:
As you are also using a NFS share on this QNAP, it seems more like this problem [1].

So update your host to get the pve-5.15.35 kernel [2] (if you are on the no-subscription repo; not sure if it is already in the enterprise repo), boot with the new kernel and see if your problem is gone.

[1] https://forum.proxmox.com/threads/issue-after-upgrade-to-7-2-3.109003/#post-468458
[2] https://forum.proxmox.com/threads/proxmox-ve-7-2-released.108970/page-2#post-468872

Thanks you very very much, the Update to the new Kernel 5.15.35 fixed the problem completly.

winproof · May 10, 2022

egoistdream said:
HI,

I also have issue with HPE Smart Array P410 If I using the latest kernel 5.15.35-1-pve, with 5.13.19-6-pve works without no issue.

May 07 03:00:51kernel: DMAR: [DMA Read NO_PASID] Request device [01:00.2] fault addr 0xf363e000 [fault reason 0x06] PTE Read access is not set
Failed to start Ceph object storage daemon, all the ssd are seen but they are unaccesible and the ceph can't start and the logs are full with the kernel fault

hello

I have exactly the same problem with kernel 5.15.30-2 (entreprise repo), HPE Smart Array P222 on HP Gen8 Microserv.
reboot on previous kernel (5.13.19-6) solve problem.

ticket open

EDIT : ticket closed because we have only a community subscription, open new thread :
https://forum.proxmox.com/threads/kernel-5-15-30-2-break-hpe-smart-array-p222.109298/

Stoiko Ivanov · May 10, 2022

egoistdream said:
I also have issue with HPE Smart Array P410 If I using the latest kernel 5.15.35-1-pve, with 5.13.19-6-pve works without no issue.

May 07 03:00:51kernel: DMAR: [DMA Read NO_PASID] Request device [01:00.2] fault addr 0xf363e000 [fault reason 0x06] PTE Read access is not set
Failed to start Ceph object storage daemon, all the ssd are seen but they are unaccesible and the ceph can't start and the logs are full with the kernel fault

could you please share:
* `cat /proc/cmdline`
* do you have any modifications on the system in place? (from the error message - e.g. do you use pci-passthrough - or do you have enabled intel_iommu or somehing like that)?
in any case:
* if possible at all - install the latest firmware for the system and all controllers
* try adding `intell_iommu=off` to the kernel commandline and rebooting with it

Search

Search

New Kernel PVE-5.15.30-2 breaks iscsi connections

sysadminfromhell

Member

bbgeek17

Distinguished Member

sysadminfromhell

Member

egoistdream

Active Member

bbgeek17

Distinguished Member

sysadminfromhell

Member

Attachments

Neobin

Distinguished Member

sysadminfromhell

Member

winproof

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member