MPIO with Proxmox ISCSI and Truenas

poxin · Mar 7, 2023

For the life of me I cannot figure out how to get LVM working on top of an ISCSI share coming from a TrueNAS box. I believe multipath is working and I'm able to get the iscsi storage into proxmox, but not able to get LVM on top of it per the error below:

Code:

mpatha (36589cfc0000003fa19833b52347a602e) dm-0 TrueNAS,iSCSI Disk
size=1.6T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 9:0:0:0 sde 8:64 active ready running
  `- 8:0:0:0 sdd 8:48 active ready running

Code:

root@c4-pve-04:~# iscsiadm -m session
tcp: [5] 10.201.201.5:3260,1 iqn.2005-10.org.freenas.ctl:c4-tn-ssd-01-mirrortest (non-flash)
tcp: [6] 10.202.202.5:3260,1 iqn.2005-10.org.freenas.ctl:c4-tn-ssd-01-mirrortest (non-flash)

root@c4-pve-04:~# pvs
-empty-

root@c4-pve-04:~# lvs
-empty-

root@c4-pve-04:~# vgs
-empty-

root@c4-pve-04:~# lsblk
NAME     MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda        8:0    0 111.8G  0 disk 
├─sda1     8:1    0  1007K  0 part 
├─sda2     8:2    0   512M  0 part 
└─sda3     8:3    0 111.3G  0 part 
sdb        8:16   0 111.8G  0 disk 
├─sdb1     8:17   0  1007K  0 part 
├─sdb2     8:18   0   512M  0 part 
└─sdb3     8:19   0 111.3G  0 part 
sdd        8:48   0   1.6T  0 disk 
└─mpatha 253:0    0   1.6T  0 mpath
sde        8:64   0   1.6T  0 disk 
└─mpatha 253:0    0   1.6T  0 mpath

alexskysilk · Mar 8, 2023

vgcreate my_volume_group /dev/mpatha

bbgeek17 · Mar 8, 2023

The LVM setup must be placed on the mpatha device. Once multipath owns the sdX devices you cant use them. As indicated - create the initial LVM structure manually.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

poxin · Mar 8, 2023

Alright so I add ISCSI into Storage first, then use vgcreate, then add LVM. When adding LVM you don't select the base storage as ISCSI but instead choose the previously created LVM under Existing Volume Groups. Is this correct?

bbgeek17 · Mar 8, 2023

poxin said:
Alright so I add ISCSI into Storage first, then use vgcreate, then add LVM. When adding LVM you don't select the base storage as ISCSI but instead choose the previously created LVM under Existing Volume Groups. Is this correct?

correct

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

poxin · Mar 8, 2023

Attempting to understand what's going on when the underlying storage "fails" when using iscsi multipath (testing disaster scenario with a full storage outage). I can see both links go down using "multipath -ll". Oddly the VM's appear online and I'm able to ping from them, and to them. I waited for about 20 minutes with the storage offline but the VM console was still active to my surprise. I then brought the storage back and everything came back without even a hiccup. Uptime in the VM's never stopped.

I would expect the VMs to go offline when the storage is dropped? Is it somehow being saved in RAM or otherwise? Was not expecting this behavior coming from hyper-v world.

bbgeek17 · Mar 8, 2023

Its hard to say since you didnt present any outputs, but lets assume your observation of path status were correct. Was this VM a Linux? If so, then likely it just didnt need to write anything to storage, whereas Windows would be more aggressive in disk use. A basic Linux install can run on autopilot without writing to disk for a while. Most likely if you attempt to write to VM disk it will cause a lock up.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

poxin · Mar 8, 2023

It was a Windows VM for this test. Not sure what outputs to provide.

poxin · Mar 8, 2023

I did just test again and tried to perform some disk actions on the Windows VM but this hung the VM. After bringing the storage back online I had to force restart the VM. Ping was working the entire time there was no real indication it was hung until trying to access it. Still an odd behavior to me, I would expect the VM to go offline and HA to complain a bit when the storage is dropped. Especially after missing for 20 minutes.

bbgeek17 · Mar 8, 2023

There are few things going on :
First, the "queue_if_no_path" means :

Code:

If features "1 queue_if_no_path" is specified in the /etc/multipath.conf file, then any process that issues I/O will hang
until one or more paths are restored. To avoid this, set the no_path_retry N parameter in the /etc/multipath.conf file
 (where N is the number of times the system should retry a path).

Or in the latest multipath:

Code:

queue_if_no_path
                                    (Deprecated, superseded by no_path_retry) Queue I/O if no path is active.  Identical to the
no_path_retry with queue value. If both this feature and no_path_retry are set, the  latter  value
                                    takes precedence. See KNOWN ISSUES.

Code:

no_path_retry    Specify what to do when all paths are down. Possible values are:
                        value > 0   Number of retries until disable I/O queueing.
                        fail        For immediate failure (no I/O queueing).
                        queue       For never stop I/O queueing, similar to queue_if_no_path. See KNOWN ISSUES.
                        The default is: fail

So, the Hypervisor Kernel will never send an IO error back up the stack, it will wait accumulating IO. The next timeout that comes into play is the VM OS timeout, it may or may not get upset that no replies or errors are returned. Depends on OS.

Second, the integration between storage and hypervisor differs a lot between ESXi, HyperV and Proxmox (KVM/Linux). Some are more tight than others.

The fact that ICMP works is an example of a system that is hung enough to be unresponsive to user/app, but still active at Kernel layer and TCP/IP stack. This is a prime example of why ICMP is not a good health check.

As for HA, again vendors provide different implementations. AFAIK PVE does not have a feature that would failover a single VM based on storage path failure. But PVE HA is not intended to protect against dual path failure of the network/storage. If you implemented the HA properly (two NICs, two switches) then dual failure likely means outage of the entire infrastructure and there is nowhere to failover. The PVE HA is there to protect against PVE node failure. If you need application HA, then it needs to be implemented at Application layer with proper healthchecks.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

poxin · Mar 8, 2023

Great reply thank you for spending the time to explain all that, it helps. Going to review and chew on this for a bit, and read some more docs on multipath configuration options.

Search

Search

MPIO with Proxmox ISCSI and Truenas

poxin

Well-Known Member

alexskysilk

Distinguished Member

bbgeek17

Distinguished Member

poxin

Well-Known Member

bbgeek17

Distinguished Member

poxin

Well-Known Member

bbgeek17

Distinguished Member

poxin

Well-Known Member

poxin

Well-Known Member

bbgeek17

Distinguished Member

poxin

Well-Known Member