iscsi disk following a VM.

proxnoci

Member
Jan 15, 2023
43
5
8
I have hyper converged PVE cluster of 3 nodes using a ceph shared disk.
Also i have a large disk for one of the VM/CT's that should be in iscsi storage. (where i have a few TB spare).
(it fills 50% of the ceph storage and will grow ...).

How to ensure the iscsi disk follows the Container, ie the host logs on the iscsi device before activating the container/VM.
and logs off the disk after shutdown. This should also happen during migration.
 
It depends on many factors, most importantly how you configured the storage itself.

In general, assuming that you created a storage object in Proxmox of iSCSI type then it will be propagated to all members in the cluster.
The system behavior then depends on whether you overlaid iSCSI with LVM or used a Direct LUN. In either case it will be Proxmox that will arbitrate volume activation as needed. I.e. things should just work.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
iscsi = QNAP NAS storage.
I didn't "Just try" it, as i know how disk data can be horribly ruined when there is no locking protocol or other means of exclusive access to the storage blocks in a structured fashion.
ie. OpenVMS has an integrated locking mechanism between systems (aka Lockmanager).
Linux GFS based systems can concurrently be used without harm, each system has it's journal and a "OpenVMS look alike lockmanager" is used to guard the filesystem structures.
Others like ceph, samba, have some broker service to prevent mishaps. iscsi has no guardrails.
That's the background for asking. Concurrent access between several VM's / CT is NOT needed, just no headaches during migrations.
 
Proxmox does not include an integrated clustered filesystem.

The proxmox cluster itself acts as a lock manager/access arbitrator. The two integrated supported methods for iSCSI are:
- LVM thick. The LVM slice gets activated/deactivated based on where its consumer (VM/CT) is located. It cant be activated on both nodes as the VM/CT can only run on one. However, the underlying iSCSI LUN is always active on all servers in the cluster. Multipath is interleaved in the middle as needed as well.
- Direct LUN pass-through, again access is established based on the consumer VM/CT presence. The LUN is present on all nodes, but Proxmox cluster ensures that only one VM/node at a time actively accesses it.

You can just carve out a new smaller LUN and try it with a disposable container.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Well it does have a clustered filesystem in the form of ceph...

Appearantly one cannot select an iscsi disk to mount on a container as mount point...
Creating an LVM container on it, and then assigning a partition to a container...... causes a LV copy to the other system.
LVM's are appearantly considered local storage... and a X hundred GB copy is not something that helps a speedy migration...

2023-02-22 01:48:20 shutdown CT 115
2023-02-22 01:48:22 starting migration of CT 115 to node 'pve2' (192.168.)
2023-02-22 01:48:23 volume 'machines:vm-115-disk-0' is on shared storage 'machines'
2023-02-22 01:48:23 volume 'machines:vm-115-disk-2' is on shared storage 'machines'
2023-02-22 01:48:23 found local volume 'nextdataPV:vm-115-disk-0' (in current VM config)
2023-02-22 01:48:24 volume nextdataVG/vm-115-disk-0 already exists - importing with a different name
2023-02-22 01:48:24 Logical volume "vm-115-disk-1" created.
 
Last edited:
You are conflating a few different things and some of it may be a language barrier between us.

Ceph is Ceph, iSCSI is iSCSI. The two storage types are addressed and handled differently. One could conceivably place Ceph on top of iSCSI, however thats not what developers had in mind. One can also export a Ceph disk as iSCSI, that would also defy good logic.

On CT creation PVE will create rootfs volume and copy the CT context to it. This is required for CT, VMs can work differently.
A secondary disk can be added to CT as Bind Mount. If you chose to overlay iSCSI with LVM/Thick, then another slice will be created.
The storage must be marked as shared via CLI/API/GUI to be considered shared. Only certain storage types are supported as shared:
https://pve.proxmox.com/wiki/Storage

I recommend you do a bit more reading, perhaps watch some youtube videos to get a better understanding of all the involved pieces.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I guess a language barrier... and some blanks space that needs filling.
My background: System managers for OpenVMS & Unix, running clusters since 1990's, so the concept isn't new
For me i am getting acquainted to proxmox. (understanding the principles & tools) not looking for a simple click-here-click-there.

ceph is ceph is a clustered filesystem... - medium level ( LVM level).
iscsi = scsi (disk access network) over internet. - low level (disk block access).
Mixing them doesn't help unless they all use different network ports..., i expect horrible latency issues though.

My Environment
There is a three node PV cluster with three equivalent nodes. using a small form factor Core i7 PC.
Iscsi is on a QNAP NAS. (4TB storage there, setup for iscsi).
ceph is on NVMe 1TB.
I have setup an LXD farm in the past myself, as proxmox looks like it solves a lot of things with less hassle i am migrating from private farm onto proxmox.
One of feateures i like in proxmox is migrating containers between systems.

Also replacing 500W guzzling old hardware with 40W hardware is a goal.

One of the containers is a Nextcloud container (debian based) with a nextcloud volume that is about to grow beyond a 500GB hence the desire to have the bind mounted /nextdata directory in non-ceph storage.
All running with as little overhead as possible.

My original question was about HOW to let a ISCSI volume follow a CT.
Answer: that is automatic.....
Observation: it isn't... LVM is considdered local storage. iscsi may not be, but it is mounted on all systems in the cluster.
and there are no logouts / logins observable.

Hopefully that does fill some blanks.

Another observation:
The proxmox project looks a LOT like the Nextcloud project.... LOTS's really LOTS of info, just inaccessible for the "newly comes" because of hurdles that are mountains to "starters" and mole-hills when climbed.... It's just that some of the project very much look like the vertical walls to be kept in place in stead of some helpful staircases, or slopes that can be mastered.
I tried building some plugins for Nextcloud in the past..., there was just too much in flux. Examples, templates were provided, for 3-10 versions back that didn't work anymore.. so a lot (too many?) of dead ends in research.
All info is available as snippets all over the place... i am looking for glue.
 
Here is an example of what, I think, you are trying to achieve. I will use Blockbridge storage and some shortcuts that will not be available for your NAS. However, I am sure you can find alternative methods to achieve the same result:

1) Create a Disk on NAS
Code:
#bb vss provision -c 32GiB --with-disk --label proxmox --disk-label disk
2) Attach the disk to both nodes in the cluster, ie command is run on each node
Code:
# bb host attach -d disk --multi-writer --persist
3) confirm the disk is visible by the OS
Code:
#root@pve7demo1:~# lsblk
#sdc                            8:32   0   32G  0 disk
4) this is an iscsi disk
Code:
#root@pve7demo1:~# iscsiadm -m node
#172.16.200.42:3260,1 iqn.2009-12.com.blockbridge:t-pjwafzugcjf-ccngipjn
5) create an LVM structure
Code:
#sgdisk -N 1 /dev/sdc
#pvcreate /dev/sdc1
#vgcreate vmdata /dev/sdc1
6) create PVE shared storage object
Code:
#pvesm add lvm vmdata --vgname vmdata --content rootdir,images --shared 1
7) create a container with rootfs placed on Blockbridge iSCSI storage utilizing our plugin
Code:
pct create 103 local:local:vztmpl/ubuntu-22.10-standard_22.10-1_amd64.tar.zst --rootfs bb-iscsi:1
8) add a DATA disk to container. I would normally be using Blockbridge driver in our installation. However for this example I will utilize LVM thick storage I created on raw iSCSI attached disk in steps 1-5
Code:
#pct set 103 -mp0 vmdata:10,mp=/vmdata
9) lets examine the disk presentation inside the container:
Code:
pct enter 103
root@CT103:~# df
Filesystem                          1K-blocks   Used Available Use% Mounted on
/dev/sde                               996780 631272    296696  69% /
/dev/mapper/vmdata-vm--103--disk--0  10218772     28   9678072   1% /vmdata

Moving to the second node in the cluster
10) examine lsblk and notice the individual iSCSI disk is present and active. The rootdisk in this case is not present, as Blockbridge plugin takes care of timely attaching the disk as needed.
Code:
root@pve7demo2:~# lsblk
NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdc                    8:33   0   32G  0 disk
11) migrate the container (command is run from node1)
Code:
pct migrate 102 pve7demo2 --restart
12) Proxmox took care of ensuring that no Proxmox controlled application is accessing Data disk pve-vm--103--disk--0 disk on node1. Access has been allowed for LXC on node2. The rootfs disk located on Blockbridge storage has been automatically re-mounted on node2 as well.
Code:
root@pve7demo2:~# bb host info
== Localhost: pve7demo2
Hostname              pve7demo2
Initiator Name        iqn.1993-08.org.debian:01:ce28b56bb9dc


== Disks attached to pve7demo2
vss [1]                    disk     capacity  paths  protocol  transport  mode        device
-------------------------  -------  --------  -----  --------  ---------  ----------  --------
bb-iscsi:vm-102-disk-0     base     4.0GiB    1      iscsi     TCP/IP     read-write  /dev/sdi
proxmox               disk     32.0GiB   1      iscsi     TCP/IP     read-write  /dev/sdb


To summarize: When using non-Blockbridge iSCSI storage with PVE cluster, the most straightforward approach is to use LVM and allow PVE full management. The resources are presented to CT as Device Mount Points (man pct). The handling/transfer of ownership of DMPs depends on the underlying storage. You can also pass-through entire LUN, there are examples available in forum search.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
There is a three node PV cluster with three equivalent nodes. using a small form factor Core i7 PC.
Also replacing 500W guzzling old hardware with 40W hardware is a goal.
Ceph isnt a good solution for this configuration. you really need a larger number of OSDs to make a workable solution, and faster interconnects that will consume that amount of power on their own. seems like @bbgeek17 proposal is a bit more in scope.
 
@bbgeek17 Hm. works..., why didn't work when created through the GUI?.

ceph was an experiment after iscsi failed and continued to fail and all the "found" solutions ran into some kind of problem. toggling "shared" didn't show a difference then.
ceph does work.. mind you this isn't a production environment, a homebrew system, having 1GBps connections are available.
Thanks, it does work now.
I need to look for CLI based solution apperantly...

@alexskysilk , that setup was the intended setup it somehow didn't work out before.
 
The receipt did work on one of the nodes without any issue (the one i was connected to).
The other nodes did have issues with accessing the PV. there the shared disk did work after reboot.
anyway now it does work as advertised. Maybe a reboot would have solved the earlier issues as well.
Thanks.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!