In need of some help on HA and iSCSI setup

K

kameleon

Guest
I seem to have hit a brick wall. I have our three test machines setup and accessing our iSCSI SAN. However my problem is how to get them all to access the needed disks. Let me start of by saying I have never dealt with a SAN like this before so be easy on me if I ask a n00bish question. I know the basics but it just feels like there is one little thing missing. I want to setup the guests with lvm disks for performanace reasons. However I cannot have all the hosts accessing the lv's at the same time. Theoretically if I have guest1 failover from host1 to host2, host1 would not be accessign that lv anymore so host2 could take it and make it active and use it. This would require me to have each separate machines disk in its own volume group and then have the logical volume on that correct?

Secondly, how would be the best way to do our mail storage on the san? Same way? or is there a way to have multiple machines access the data simultaiously. I would figure we would setup our samba share drives similarly.
 
I think I might have worded this incorrectly. I should have said what is the best way to setup LVM with network backing? Do I need one lun for each machine or how does this come into play with HA?
 
Let's see. There is no problem in multiple machines accessing the same VG (different LV) at the same time.
However:
-Nothing stops you from mounting the same filesystem (starting the same vm) in several nodes, and this will break things for sure.
-You can't alter LVM metadata from other than the primary node.

To solve this issues you will need cLVM (c is for cluster), but that needs a whole cluster stack (pacemaker, node fencing, etc).

If you are careful you can use "manual failover" without configuring an HA cluster, or in case you don't have hardware fencing devices. To do this I strongly suggest disabling any vm startup on node boot.

I have tested multiple nodes accessing the same LV, even live migration with KVM, accesing a SAN over iSCSI.

I hope I've cleared some things out.
 
Thanks for that. I think most of my issues were related to me setting up the SAN strictly in the underlying OS. I am now testing by doing it all in the web gui just to see how that plays. Now for a few more questions since it sounds like you have this working properly:

Should I create one large VG and just do the lvs for each machine?
Is there a good writeup on cLVM and needed items with proxmox?
In an ideal world we won't do anything but manual failover/migration but with proxmox HA does it have to kill the guest and restart it on the other machine in the event of a hardware/network failure?
With the web gui I can only input one seed IP for the iSCSI SAN, how can I enable multipath for more redundancy?
What is the best way to get maximum throughput to both network and SAN, bond all interfaces and pass the various vlans across that or split the physical interfaces into network and san traffic?

Thanks for the guidance.
 
Thanks for that. I think most of my issues were related to me setting up the SAN strictly in the underlying OS. I am now testing by doing it all in the web gui just to see how that plays. Now for a few more questions since it sounds like you have this working properly:

Should I create one large VG and just do the lvs for each machine?
Yes.

Is there a good writeup on cLVM and needed items with proxmox?

I don't know of any, but I think the gui should be able to handle everything regarding cLVM, I have not tested this though.
In an ideal world we won't do anything but manual failover/migration but with proxmox HA does it have to kill the guest and restart it on the other machine in the event of a hardware/network failure?

Yes, it's the only way to be 100% sure that the storage volume it's clear to use.
With the web gui I can only input one seed IP for the iSCSI SAN, how can I enable multipath for more redundancy?

You need to install the multipath-tools package, (sometimes, depending on your san) create /etc/multipath.conf and, when you create your PV/VG, use the multipathed node to it.
You can see how it looks like with multipath -l, you should see something like this:
Code:
36782bcb00077095300002d0b4f911276 dm-3 DELL,MD32xxi
size=8.0T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
`-+- policy='round-robin 0' prio=-4 status=active
  |- 9:0:0:2 sdd 8:48 active undef running
  |- 8:0:0:2 sdc 8:32 active undef running
  |- 7:0:0:2 sdf 8:80 active undef running
  `- 6:0:0:2 sde 8:64 active undef running

That means, you should use /dev/disk/by-id/dm-uuid-mpath-36782bcb00077095300002d0b4f911276 it's important to refer to the disk by it's id, not devnode (sdd) nor the path, because the uuid it's consistent across nodes.

What is the best way to get maximum throughput to both network and SAN, bond all interfaces and pass the various vlans across that or split the physical interfaces into network and san traffic?

The best is to have your storage and data network separated. Also, use jumbo frames, and play with flow control and storm control in your switch to see what's best for your scenario.
In my case, I have 2 NICs for data, using bonding + bride and 2 NICs for storage.
The storage NICs have different subnets, each one of them corresponding to two of the 4 ethernet ports in my san.

Thanks for the guidance.

You're welcome, please share your results. What SAN are you using?
 
Thanks for the fast reply. That cleared up alot. So as for cLVM I just install the package and then use it as normal LVM? I can't seem to find much specifics on that but it appears it just adds a compatibility layer to LVM2 to make it cluster aware. I have my multipath.conf setup for friendly names. That way I can map the UUID to a /dev/mapper/whateverinameit instead of that long UUID but it acheives the same end result. When I try to install clvm it appears as though proxmox installs it by default.

Our SAN is a Dell MD3200i with 12 2TB disks in a raid6 and dual 4 port controllers. Our primary server is a Dell T710 in rackmount config: dual quad core E5504's, 32GB ram, 8x 2TB disks in a raid6, 4 gigabit network ports. Currently it is still running Xen but we are moving to proxmox. The secondary server is a Dell R610: dual quad core E5506's, 24GB ram, 3x 160GB disks in a raid5, 4 gigabit network ports. Then for a third node for HA functions and a "testbed" we will use our Dell 2900: single E5405, 8GB ram, 8x 300GB disks in a raid6, 2 gigabit network ports. All three servers have the iDRAC enabled for fencing.

I was thinking the same thing on the network interfaces. When you say you have two different subnets on each of the two nics to correspond to the 4 san ports, how did you do this? Something like eth2 and eth2:1? Here is my /etc/network/interfaces from the 2900 that has 2 interfaces:
Code:
auto lo
iface lo inet loopback

# interface GigabitEthernet1/0/3 on the IS rack switch
auto eth0
iface eth0 inet manual

# interface GigabitEthernet1/0/4 on the IS rack switch
auto eth1
iface eth1 inet manual
mtu 9000

auto bond0
iface bond0 inet manual
slaves eth0
bond_miimon 100
bond_mode 6

auto bond1
iface bond1 inet manual
slaves eth1
bond_miimon 100
bond_mode 6
mtu 9000

#auto bond0.1
#iface bond0.1 inet manual
#vlan-raw-device bond0

auto bond0.2
iface bond0.2 inet manual
vlan-raw-device bond0

auto bond0.3
iface bond0.3 inet manual
vlan-raw-device bond0

auto bond0.4
iface bond0.4 inet manual
vlan-raw-device bond0

auto bond0.5
iface bond0.5 inet manual
vlan-raw-device bond0

auto bond0.6
iface bond0.6 inet manual
vlan-raw-device bond0

auto bond0.7
iface bond0.7 inet manual
vlan-raw-device bond0

auto bond0.8
iface bond0.8 inet manual
vlan-raw-device bond0

auto bond0.9
iface bond0.9 inet manual
vlan-raw-device bond0

auto bond0.10
iface bond0.10 inet manual
vlan-raw-device bond0

auto bond0.11
iface bond0.11 inet manual
vlan-raw-device bond0

auto bond0.12
iface bond0.12 inet manual
vlan-raw-device bond0

auto bond0.13
iface bond0.13 inet manual
vlan-raw-device bond0

auto bond0.14
iface bond0.14 inet manual
vlan-raw-device bond0

auto bond0.15
iface bond0.15 inet manual
vlan-raw-device bond0

auto bond1.20
iface bond1.20 inet manual
vlan-raw-device bond1
mtu 9000

auto vmbr0
iface vmbr0 inet static
        address 10.8.1.151
        netmask 255.255.255.0
        gateway 10.8.1.1
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0

auto vmbr2
iface vmbr2 inet manual
bridge_ports bond0.2
bridge_stp off
bridge_fd 0

auto vmbr3
iface vmbr3 inet manual
bridge_ports bond0.3
bridge_stp off
bridge_fd 0

auto vmbr4
iface vmbr4 inet manual
bridge_ports bond0.4
bridge_stp off
bridge_fd 0

auto vmbr5
iface vmbr5 inet manual
bridge_ports bond0.5
bridge_stp off
bridge_fd 0

auto vmbr6
iface vmbr6 inet manual
bridge_ports bond0.6
bridge_stp off
bridge_fd 0

auto vmbr7
iface vmbr7 inet manual
bridge_ports bond0.7
bridge_stp off
bridge_fd 0

auto vmbr8
iface vmbr8 inet manual
bridge_ports bond0.8
bridge_stp off
bridge_fd 0

auto vmbr9
iface vmbr9 inet manual
bridge_ports bond0.9
bridge_stp off
bridge_fd 0

auto vmbr10
iface vmbr10 inet manual
bridge_ports bond0.10
bridge_stp off
bridge_fd 0

auto vmbr11
iface vmbr11 inet manual
bridge_ports bond0.11
bridge_stp off
bridge_fd 0

auto vmbr12
iface vmbr12 inet manual
bridge_ports bond0.12
bridge_stp off
bridge_fd 0

auto vmbr13
iface vmbr13 inet manual
bridge_ports bond0.13
bridge_stp off
bridge_fd 0

auto vmbr14
iface vmbr14 inet manual
bridge_ports bond0.14
bridge_stp off
bridge_fd 0

auto vmbr15
iface vmbr15 inet manual
bridge_ports bond0.15
bridge_stp off
bridge_fd 0

auto vmbr20
iface vmbr20 inet static
address 192.168.130.51
netmask 255.255.255.0
bridge_ports bond1.20
bridge_stp off
bridge_fd 0
mtu 9000

auto vmbr20:1
iface vmbr20:1 inet static
address 192.168.131.51
netmask 255.255.255.0
bridge_ports bond1.20:1
bridge_stp off
bridge_fd 0
mtu 9000

auto vmbr20:2
iface vmbr20:2 inet static
address 192.168.132.51
netmask 255.255.255.0
bridge_ports bond1.20:2
bridge_stp off
bridge_fd 0
mtu 9000

auto vmbr20:3
iface vmbr20:3 inet static
address 192.168.133.51
netmask 255.255.255.0
bridge_ports bond1.20:3
bridge_stp off
bridge_fd 0
mtu 9000

Any input on cleaning that up?
 
Last edited by a moderator:
This is how my interfaces file looks like:

Code:
auto lo
iface lo inet loopback

iface eth0 inet manual

auto eth2
iface eth2 inet static
        address  10.100.0.2
        netmask  255.255.255.0
        mtu 9000
auto eth3
iface eth3 inet static
        address  10.100.2.2
        netmask  255.255.255.0
        mtu 9000

iface eth1 inet manual

auto bond0
iface bond0 inet manual
        slaves eth0 eth1
        bond_miimon 100
        bond_mode balance-rr
        bond-mode 802.3ad

auto vmbr0
iface vmbr0 inet static
        address  192.168.1.25
        netmask  255.255.255.0
        gateway  192.168.1.100
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0

eth0 and eth1 are connected to one switch and eth2 and eth3 to another.

I have an md3200 too, but with 1Tb disks and only one controller, with this config:
if0 10.100.0.50
if1 10.100.0.51
if3 10.100.2.50
if4 10.100.2.51
All of them to the same switch as eth2 and eth3.

I have no experience with cLVM beyond a few lab tests, but as far as I understand you need to change the locking method in lvm.conf and let the clvm daemon be controlled by the cluster software and you're set.
 
That makes more sense with only two different subnets for the SAN. I currently have 4 but I was just following the Dell documentation when I did that since I have never dealt with iSCSI before. Thanks for that, I will look more into the cLVM stuff.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!