iSCSI Configuration

oeginc

Member
Mar 21, 2009
133
0
16
Just a quick question, what is the best/proper way to configure your iSCSI targets, should I do a single iSCSI target and create all of the VM's on it, or should I do a seperate iSCSI target/map for each VM?

And why... ;)

I'm thinking I should create a new volume/map/target for each VM, but if I do that the snapshots (on openfiler) take up ALOT of room because I have to allocate space for the snapshots for every VM instead of having a global snapshot space...
 
What is the best way to share an iscsi target between two proxmox servers? I'm looking for a way to quickly start/migrate VM from a primary to a secondary server in case of necessity.

jinjer

Ultimately, I'm think Jinjer is looking for the same answer I am.. I have several ProxMox machines already in a cluster, and currently each is running with independent drives, I'm trying to find the best solution to the long migration times (took me almost 14 hours to migrate a machine the other day). I figured iSCSI would be that solution.
 
oeginc - what VM's are you referring to - KVM or openVZ? Because there are differences to consider for each type.

In general I have setup targets for openVZ and targets for KVM machines because they potentially have different read write characteristics that can be adjusted via the target setup on the iSCSI server (in our case openfiler).

So for one 1TB openfiler storage array I have 3 LUNs - 2 are used for openVZ (1 lun for the master proxmox server and the other for the cluster proxmox server), and then 3rd lun is used for KVM's and this lun is automatically shared between the 2 proxmox servers.

Why 2 LUNs for openVZ and not a shared LUN? Because openVZ cannot share the same lun as it uses it directly and creates and writes files like any locally mounted filesystem.

Here is a wiki for setting up openVZ for use in an iSCSI environment to allow for easy migration
http://pve.proxmox.com/wiki/OpenVZ_on_ISCSI_howto

For KVM on openfiler there is an issue with version 2.3 of openfiler that is not very well documented - so you might have problems getting that working, unless you do this on the openfiler:

i. Comment these lines out in /etc/rc.sysinit. Lines 333 to 337.
[FONT=&quot]Code:[/FONT]
[FONT=&quot]# if [ -x /sbin/lvm.static ]; then[/FONT]
[FONT=&quot]# if /sbin/lvm.static vgscan --mknodes --ignorelockingfailure > /dev/null 2>&1 ; then[/FONT]
[FONT=&quot]# action $"Setting up Logical Volume Management:" /sbin/lvm.static vgchange -a y --ignorelockingfailure[/FONT]
[FONT=&quot]# fi[/FONT]
[FONT=&quot]# fi[/FONT]

ii. Turn off aoe (ATA over Ethernet) service from autostart by running:
chkconfig aoe off

What this resolves is a problem with openfiler activating the logical volumes that are created by proxmox when setting up a logical volume group for KVM virtual machines. For some reason in openfiler 2.3 it has a problem with activated logical volumes within other activated logical volumes and locking files. So (and you can test this for yourself), once you have a new KVM virtual machine setup - shut the vm down and shutdown proxmox and reboot your openfiler server and you will see that the iscsi target is no longer available and your kvm vm will fail to start once you start the proxmox server back up.

Things to consider - KVM migration allows live migration with almost ZERO downtime, however performance for linux is not so good
openVZ setup as per the above wiki - allows for iSCSI and migration - however migration is offline so there is a small outage - look at the end of the wiki for more info

Cheers
 
Last edited:
Sorry for the late answer.. I'm not very active in proxmox forum as I'm testing a whole range of other virtualization systems (vsphere, oracle vm, cloud.com to name a few).... add to this some hardware tests as I'm looking for an el-cheapo but safe-to-use san solution and also trying to find a proper blade solution without being locked in to a vendor and without being ripped off by the storage guys.... and you guessed there's little time for anything else.

Please correct me if I'm wrong, but the OpenVZ over iSCSI howto for proxmox is just a walk-trough for using iscsi target instead of a local disk, but using it as a local disk. There's nothing <shared> there and migrations would take ages anyway. This is true for all but the simple servers. Anything with more than a few million files and rsync will take it's time just to check what files have changed (i.e. downtime).

OCFS2 could be used for openvz if you can live without quotas (as in private solution) but it's not ok for commercial customers. Or you could just forget migration and use bind-mounts of ocfs2 backed storage inside separate (but similar or identical) openvz containers. I do something similar for an 4xActive imap/pop3 toaster. The servers live in different proxmox nodes and there's a load balancer on top of them. Backend storage is ocfs2 bind-mounted inside the openvz servers from the host server (with the help of some custom made scripts).

IMHO, proxmox is missing a simple feature but a huge show stopper for use in real world: if a node dies (say it's not the master) then the master looses any information about the VM on that node. Using shared storage a migration and HA could be as simple as staring the VM on another node. However this is not currently possible unless we resort to special jimmicks (like copying config files by hand and giving them different CTIDs).
 
Correct about the walk through - it is simply to have the openvz mounted on iscsi as opposed to local disks. It's doesn't work with shared storage (unfortunately and I'm not 100% why - it just doesn't work).

I am interested in what you are talking about with the OCFS2 filesystem though and bind mounts or even OCFS2 in general - if that works then if I understand correctly you could have say ctid 101 on server 1 and ctid 1101 on server 2. Lets say you want to migrate them then just stop 101 on server 1 and start 1101 on server 2, and obviously the same if either server dies?

The way I have been "migrating" servers in an iSCSI environment to date is to do rsyncs after hours for any likely candidates, and if they require migration during the day, i do another 2 rsyncs, shutdown the machine, run another rsync and then bring it up under a different CTID on the other server. It reduces the time down from about 3 to 5 minutes to about 20 seconds.

This also allows for recovery if as you said a node dies - you wouldn't have an up to date copy of all the vm's on the other server, so instead of starting up older copies (not what we want to do), simply update the root and private folders in the config files to point to a another location and mount the iSCSI LUN from the server that died to that location and start all the dead vm's that way.

If OCFS2 without quota or OCFS2 with bind mount (and quota?) works then it would take a lot of headache out of this setup because we could then be using a shared file system and simply start it under a different CTID that way.

What are any performance, recovery and or issues with OCFS2 over ext3 for example?
 
Well, ocfs2 is ideally suited for holding large files (i.e. raw disks for kvm machines). Then you don't need to sync anything and HA is as easy as starting the KVM on another node (or also migrating it while running).

The way I use ocfs is because I need an active-active setup for 0 downtime and also for performance, so I have two distinct servers running on two nodes. The two servers share the "data" served to clients by means of ocfs2 filesystem running on the node and bind-mounted inside the running containers on both nodes. So if one server fails I don't need to do anything as there's a load balancer on top of this setup that will simply kick-out the failed server from the "cluster".

If I understand, you're using iscsi to recover files that could be unavailable if a node fails. I use ocfs2 to share this same files accross the running containers so that replication is not necessary.

This brings me to what I really dislike about proxmox: the need for jimmicks like the 100.ctid and 1100.ctid replicated by hand. This can be solved easily with a single directory synced across the cluster. The directory would contain a single file for each VM (say: ctid.lasthost) with the name of the last known good host the VM was started on, and the time of the event. A starting node would resync the directory from the master and then start VM only if appropriate (i.e. if master has not migrated the VM to another node already).

To answer your last question: ocfs2 is a cluster FS where each node in the cluster share the same view of the FS as the other nodes. There's sync between the nodes (heartbeat by network and by writing to disk continuously... once every 2 seconds). Since all nodes see the same contents of the FS there's no need to resync anything (this is done at the FS level). You need shared storage for this or drbd (but drbd is tricky and you really want to assure you never get a split-brain by all means: I'd go for network+serial connection and stonith device).

jinjer
 
OK - however openVZ is not one large file it is lots of little files. I have read also that OCFS2 requires an oracle support contract for any support, and that it has some issues with performance over ext3.

What are you using as a load balancer on top, and how are you able to run openVZ in an active active scenario?

Let me get this straight

1. You have 2 proxmox servers in a cluster and one storage server
2. via iSCSI you have your storage mounted locally on both proxmox servers, and the storage has an OCFS2 filesystem on it?
3. You are bind mounting what to what?
4. Why is there a need to bind mount when OCFS2 is a clustered filesystem - or openVZ cannot run on OCFS2 filesystem and hence the need to bind mount?
 
I've mixed two possible scenarios for usage of ocfs2/gfs2... take what suits you :)

Commercial support is just that: You need a contact with someone if you need commercial support for anything (including ext3). Ocfs2 is in the kernel so it's safe to use.

Ocfs2 or any other clustered filesystem will be slower than local filesystem. This is a fact and there's nothing that can be done for it. You need to coordinate multiple servers accross a network and this will be always slower than the local access.

OpenVZ could (maybe?) mount ocfs inside the container, but it's easier to do it on the node. Performance is also improved if you don't share the whole "/var/lib/vz" mount.

I think you need to try this for yourself, as most of the details of my implementation make use of custom programming/scripts and these probably won't fit another environment.
 
Can you perhaps map out exactly what you have so I can attempt to recreate it?
Not exactly... as I have already said all there's to it. I can try to explain with an example, but please don't ask me for a copy-paste walk-trough.

Say you need an imap/pop3 server on stared storage and active-active setup. Bill of materials:

1. 2 x servers for providing service (i.e. 2 VM on 2 separate proxmox nodes).
2. Shared storage for mailbox storage. This is your GFS/OCFS/whatever cluster filesystem
3. Load balancer: This takes care of connecting a single IP to a single service-server from the pool.
4. database for accounts (out of scope in this scope).

So... how do you do this with normal (physical) servers? You would have 2 different servers (perhaps identical and booting via PXE from identical image of a "pop toaster" with the only difference being their IP address). This servers would then mount the mailbox storage from shared FC/iSCSI/you_name_it SAN. They would use some sort of shared storage manager in the form of NFS/GFS/OCFS etc.

Doing the same using proxmox is very similar, except for the following points:

1. I prefer to use OpenVZ so that there's fewer losses due to virtualization.
2. It's not very appropriate to load the modules and managers for shared storage (gfs/ocfs) inside openvz containers (security and complexity reasons).

It's much more efficient to mount the shared storage from the proxmox node in some directory. You can then bind-mount this directory directly inside the openvz container (/var/lib/vz/root/<ctid>/...).

So the proxmox recipe is to mount the shared FS somewhere on the proxmox node. All modules required will be loaded in the proxmox kernel and not inside the container. Once the cluster/shared storage is working, you can just start openvz containers on the nodes and modify their mount/umount scripts to bind-mount the directory from the proxmox node inside their private workspace.

hope this helps.

jinjer
 

it is clear how this work and everythings seems to work but the part of : For 'Base Volume' select a LUN
How do I create a Lun on the iscsi target? I have added a lun to it on my qnap, but when I want to create a lvm group it doesn't see that LUN?


Hmm it seems that it didn't saw the iscsi correct, currently I have made a partition on it and formatting it, then see if it show up in the webgui for lvm groups, or do I have to mount it also?

I now have created a lvm group, but I can't assign any other storage groups to is, I used following,
"
First create the physical volume (pv):

proxmox-ve:~# pvcreate /dev/sdb1 Physical volume "/dev/sdb1" successfully createdproxmox-ve:~#Second, create a volume group (vg):

proxmox-ve:~# vgcreate usb-stick /dev/sdb1 Volume group "usb-stick" successfully createdproxmox-ve:~#And finally: Add the LVM Group to the storage list via the web interface: this is not working :-( or I just don't know how with the local storage I can change the storage content but not for the iscsi device or lvm group on it!


"Storage name: usb", "Base storage: Existing volume groups", "Volume Group Name: usb-stick"
 
Last edited: