iSCSI LUN Unrecognized During High Usage by Proxmox Guest VMs

cyberquarks

New Member
May 27, 2024
5
0
1
I'm experiencing an issue with my Proxmox setup where the iSCSI LUN becomes unrecognized during high usage by guest VMs. The VMs have their disks backed by this iSCSI LUN. Initially, everything works fine, but after a while, the drives stop working.

**Setup:**
- **Proxmox VE Version:** 8.2.2
- **TrueNAS Version:** TrueNAS-SCALE-24.04.0
- **iSCSI Target:** TrueNAS
- **LUN Size:** 1TB

**Commands and Outputs:**

1. **iSCSI Session Verification:**
```bash
iscsiadm -m session -P 1
```
```
Target: iqn.2005-10.org.freenas.ctl:proxmox-truenas-iscsi-1tb (non-flash)
Current Portal: 192.168.10.14:3260,1
Persistent Portal: 192.168.10.14:3260,1
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
```

2. **Volume Group Status Check:**
```bash
vgdisplay proxmox-truenas-lun
```
```
Volume group "proxmox-truenas-lun" not found
```

3. **Physical Volume Check:**
```bash
pvscan
```
```
PV /dev/sda3 VG pve lvm2 [<930.48 GiB / 16.00 GiB free]
```

4. **iSCSI Device Listing:**
```bash
ls /dev/disk/by-path/
```
```
ip-192.168.10.14:3260-iscsi-iqn.2005-10.org.freenas.ctl:proxmox-truenas-iscsi-1tb-lun-0
```

5. **Manual iSCSI Login Attempt:**
```bash
iscsiadm -m node --targetname iqn.2005-10.org.freenas.ctl:proxmox-truenas-iscsi-1tb --portal 192.168.10.14 --login
```
```
iscsiadm: default: 1 session requested, but 1 already present.
iscsiadm: Could not log into all portals
```

6. **Block Device Listing:**
```bash
lsblk
```
```
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdj 8:144 0 1000G 0 disk
├─proxmox--truenas--lun-vm--101--disk--0
└─proxmox--truenas--lun-vm--101--disk--1
```

7. **LVM Physical Volume Display:**
```bash
pvdisplay /dev/sdj
```
```
Cannot use /dev/sdj: device is too small (pv_min_size)
```

8. **System Logs:**
```bash
journalctl -xe
```
```
I/O error, dev sdj, sector [various sectors] op 0x1:(WRITE) flags 0x208800
```

Any insights on why the iSCSI LUN becomes unrecognized and how to prevent this from happening would be greatly appreciated.
 
The `ticks` are not valid in the forum. You can use CODE tags (Available from the formatting menu at the top of the edit box) to make your message readable.

The iSCSI interaction between the PVE host and your storage is constrained to the Linux Kernel on the PVE host. Proxmox userland packages don't take part in it.
Without any debugging, here are a few possible causes:
  • TrueNAS is overwhelmed and stops responding in a weird way
  • Kernel bug on the client (PVE host). Note, if this were the case, it would not be specific to PVE.
  • Network issues
  • Hardware failures anywhere in the path

Most concerning are, of course, the I/O errors. Does the system work itself out of the bind, or do you have to reboot? Have you tried to log out the iSCSI session and log back in? Is there anything else in the journal?
There is no option or knob you can turn to just "fix things". You will need to spend time troubleshooting and reproducing the issue:

- Try different (older) Kernels.
- Capture and analyze network traces.
- Ramp up the usage slowly to find the breaking point.
- Increase the logging for iscsid and/or other components.
- Reach out to TrueNAS support, perhaps they've seen something similar already.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!