New Kernel PVE-5.15.30-2 breaks iscsi connections

Feb 3, 2022
65
6
13
28
Hey,

so because in the other thread no one was answering and basically ignoring this fact I have to reopen a new one.
So: with the new Kernel PVE-5.15.30 my iSCSI Connections breaks and cant be revived. only the with the "old" 5.13.19-6 kernel my iSCSI connections are working.
I have currently 3 Intel Nucs running in a HA Cluster with 1 GB Nics.
I´m not using multipath (is this the problem?).

Kind regards,
 
what kind of information do you need?

I’ve got 1x Intel Nuc 7th Gen NUC7i5DNHE
2x Intel Nuc 10th Gen NUC10i7FNHN
They all 3 got the same Intel I219-V Ethernet Port.
 
HI,

I also have issue with HPE Smart Array P410 If I using the latest kernel 5.15.35-1-pve, with 5.13.19-6-pve works without no issue.

May 07 03:00:51kernel: DMAR: [DMA Read NO_PASID] Request device [01:00.2] fault addr 0xf363e000 [fault reason 0x06] PTE Read access is not set
Failed to start Ceph object storage daemon, all the ssd are seen but they are unaccesible and the ceph can't start and the logs are full with the kernel fault
 
Last edited:
what kind of information do you need?
iSCSI is a Client/Server protocol. A single session is established between one client and one server. If the server's iSCSI implementation is any decent, it shouldn't matter how many clients you have. If more than one client breaks the server, then it's generally not the client's (PVE) fault.

When reporting an issue its very helpful to provide at the very least:
a) Client OS, version
b) Server OS, version
c) Exact detailed report of the problem
d) A repro scenario, if possible, is invaluable
e) Since PVE is involved: context of storage.cfg, VM config output, status of the storage
f) All logs before, during, after failure from both client and server.

I am sure I left something off..


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: fabian
A) Client OS (iSCSI Initiiator): Debian 11.3
b) Server OS (iSCSI Target): QNAP 5.0.0.1986
c) when i boot with the kernel 5.15.30-2 the iScsi is broken and the PVE cannot access any of the LUNS / Storages
d) swtich to the kernel and reboot
e)Storage.cfg: cat /etc/pve/storage.cfg
Code:
iscsi: HomeNAS
        portal 172.16.24.199
        target iqn.2004-04.com.qnap:ts-251plus:iscsi.lun.147903
        content none

lvm: LUN0
        vgname LUN0
        base HomeNAS:0.0.0.scsi-SQNAP_iSCSI_Storage_9f126906-457d-4b36-b7bb-31e38f758f66
        content rootdir,images
        shared 1

nfs: BackupNAS
        export /BackupNAS
        path /mnt/pve/BackupNAS
        server 172.16.24.199
        content iso,backup,vztmpl
        options vers=4.2
        prune-backups keep-last=3

pbs: pbs
        datastore BackupStore
        server pbs.fritz.box
        content backup
        fingerprint d7:c3:0a:9f:7b:e4:ed:de:5c:bf:f9:62:52:92:35:b0:3c:86:4b:9e:d0:5d:41:18:ec:f4:4d:8d:e1:89:d4:21
        prune-backups keep-all=1
        username root@pam

dir: local
        disable
        path /var/lib/vz
        content snippets
        prune-backups keep-all=1
        shared 0

f Client) Best way is to attach the boot log I guess
f Server) The iSCSI Target says "logged in" no error here

---- EDIT ----
I can check the status in the CLi with "LSSCI" or "iscsiadm -m session -P 1" both tell me that the connection is established and logged in but in PVE I cannot use any of them (see screenshot):

Code:
iscsiadm -m session -P 1
Target: iqn.2004-04.com.qnap:ts-251plus:iscsi.lun.147903 (non-flash)
        Current Portal: 172.16.24.199:3260,1
        Persistent Portal: 172.16.24.199:3260,1
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1993-08.org.debian:01:cebf5ae42a5d
                Iface IPaddress: 172.16.24.100
                Iface HWaddress: default
                Iface Netdev: default
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                
                
lsscsi
[1:0:0:0]    disk    QNAP     iSCSI Storage    4.0   /dev/sda
[N:0:4:1]    disk    Samsung SSD 970 EVO Plus 250GB__1          /dev/nvme0n1
 

Attachments

Last edited:
  • Like
Reactions: sysadminfromhell
As you are also using a NFS share on this QNAP, it seems more like this problem [1].

So update your host to get the pve-5.15.35 kernel [2] (if you are on the no-subscription repo; not sure if it is already in the enterprise repo), boot with the new kernel and see if your problem is gone.

[1] https://forum.proxmox.com/threads/issue-after-upgrade-to-7-2-3.109003/#post-468458
[2] https://forum.proxmox.com/threads/proxmox-ve-7-2-released.108970/page-2#post-468872
Thanks you very very much, the Update to the new Kernel 5.15.35 fixed the problem completly.
 
  • Like
Reactions: Neobin
HI,

I also have issue with HPE Smart Array P410 If I using the latest kernel 5.15.35-1-pve, with 5.13.19-6-pve works without no issue.

May 07 03:00:51kernel: DMAR: [DMA Read NO_PASID] Request device [01:00.2] fault addr 0xf363e000 [fault reason 0x06] PTE Read access is not set
Failed to start Ceph object storage daemon, all the ssd are seen but they are unaccesible and the ceph can't start and the logs are full with the kernel fault
hello

I have exactly the same problem with kernel 5.15.30-2 (entreprise repo), HPE Smart Array P222 on HP Gen8 Microserv.
reboot on previous kernel (5.13.19-6) solve problem.

ticket open ;)

EDIT : ticket closed because we have only a community subscription, open new thread :
https://forum.proxmox.com/threads/kernel-5-15-30-2-break-hpe-smart-array-p222.109298/
 
Last edited:
I also have issue with HPE Smart Array P410 If I using the latest kernel 5.15.35-1-pve, with 5.13.19-6-pve works without no issue.

May 07 03:00:51kernel: DMAR: [DMA Read NO_PASID] Request device [01:00.2] fault addr 0xf363e000 [fault reason 0x06] PTE Read access is not set
Failed to start Ceph object storage daemon, all the ssd are seen but they are unaccesible and the ceph can't start and the logs are full with the kernel fault
could you please share:
* `cat /proc/cmdline`
* do you have any modifications on the system in place? (from the error message - e.g. do you use pci-passthrough - or do you have enabled intel_iommu or somehing like that)?
in any case:
* if possible at all - install the latest firmware for the system and all controllers
* try adding `intell_iommu=off` to the kernel commandline and rebooting with it
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!