Proxmox 3.1 Cluster | HA | DRBD | KVM : question about HA and live migrations

x86fantini

Member
Aug 24, 2013
33
0
6
Hi, first of all, thank you very much for keeping proxmox opensource. It is a precious resource to the community. Soon, very soon, i will contribute and buy subscription :)

I have finally managed to :

1. installa pxmx 3.1 on 2 servers (identically)
2. install, configure anche enable DRBD
3. enable my IPMI, configura fencing, and enable HA and 2-node cluster.
4. test migration of my windows7 kvm and see my second node beeing restarted by fencing if i manually disconnect lan

Now, all this has been built to offer to my client (ecommerce website) high availability with drbd, taking advantage from the SSD intel raid1 disks and controller.

My goal is to have multiple KVM servers on Node1, with vms obviously saved on LVM on top of drbd: then, in case of failure of Node1, all vms will be "migrated" to Node2 and so client can continue work. BUT, i have some questions:

a) during migration (becouse of node1 failure) will VMs keep working and by this i mean will vms stay in running mode (all vms are debian 7)?
b) i have adaptec 6405 controller, 2 x intel ssd in raid 1 with all cache enabled in raid setup + bbu and zero maintenance kit: what is the best setup of cache to use in vm for have the 'till the latest bytes written on disk so during ha migration, nothing is beeing lost?
c) my first willing was to go with openvz vms, but i have discovered only after that on top of lvm/drbd i can only store kvm (i'm aware of the hack, but i do want to do it); will i possibily loose performance using kvm?

thank you again for all the support!
Simone
 
UPDATE

setting cache parameters during VM creation, lead me to 10% plus or minus performance, nothing really fency. 'till now i just stay with virio, qcaw2, nocache.

BUT, i do get a really big problem during write.

my configuration:

2 x 256 gb intel s3500 SSD raid 1 with adaptec 6405, 512mb cache, zmm bbu connected
two raid volumes:
vol1 30gb where proxmox installer did self instalation
vol2 190gb and here i follow guide: http://pve.proxmox.com/wiki/DRBD (my intention is to assign this partition /dev/sdb for my kvm)

if i create kvm on vol1 (30gb) write speed is (almost) identical as bare metal server (250mb/s)
if i create kvm on vol2 (190gb + drbd + lvm) write speed is alwasy always 11mb/s

anyone can please help?

thank you
Simone
 
I'm not expert in HA nor DRDB, but my first considerations are:
a) are those two nodes connected through 100Mb/s and not Gbit lan? (adapters, switch, check speed negotiated etc.)
b) you can't have HA with ony 2 nodes due to problem in reaching quorum
regards
 
Hi mmenaz, thx for reply.

2 nodes are connected directly using cross cable, both on 1gb lan. for the b) question, just followed proxmox wiki, and yes i have two nodes HA (working and tested) but when i write on drbd/lvm partition, i get 10mb/s using dd from VM.

maybe becouse of the 1gb lan, did not consider this in the begining. my consideration of drbd was that sync happened AFTER write has been done on first node, and NOT in real time just like now..

can you please guys help me out?

thank you
simone
 
Sorry if I insist, but 10MB/s is exactly the max throughput you can obtain using a 100Mb/s connection, so the coincidence looks so suspicious...
Try pure connection speed test using iperf.
On one node run:
iperf -s
and in the other
iperf -c 192.168.1.9 -d -t 60
(with the IP of the first node)
 
here are results:

root@prxdrbd2:~# iperf -c 10.0.0.105 -d -t 60
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.0.0.105, TCP port 5001
TCP window size: 90.4 KByte (default)
------------------------------------------------------------
[ 5] local 10.0.0.106 port 37205 connected with 10.0.0.105 port 5001
[ 4] local 10.0.0.106 port 5001 connected with 10.0.0.105 port 46543
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-60.1 sec 656 MBytes 91.5 Mbits/sec
[ 4] 0.0-60.3 sec 658 MBytes 91.6 Mbits/sec

my two-node clusters, have

1 x eth0
1 x eth1
1 x ipmi

drbd runs on it's own eth1 lan port, gigabit port

so, you consider it normal? so, if i upgrato to 10gb lan, i will go up from 10mb/write to 100mb/write correct?

still this not leading me to any goal, since using ssd disk to enhance mysql and magento performance...my bottleneck will always be drbd :(..


any ideas for HA cluster without san/nas/nfs but with drbd?

thank you very much
Simone
 
You run the test with the IP associated with eth1, right? Math does not lie! You have a 100Mb/s connection (do you see those "91.5 Mbits/sec"?).
Do you have a bad cable / not cat5e?
Do you have half duplex?
Your nic can't use not cross-cables?
What speed are they negotiating?
etc.

see my: ethtool eth0
Code:
 ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                             100baseT/Half 100baseT/Full 
                                             1000baseT/Full 
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Speed: 1000Mb/s  <--------------------------- see this?
        Duplex: Full  <--------------------------------- see this?
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: g
        Current message level: 0x00000033 (51)
                               drv probe ifdown ifup
        Link detected: yes
try it on both sides.
Also understand the byte/bit stuff, you need 8 bits to form a byte, so with 1Gbit you have 100MByte, with 100Mbit you get 10MByte (your current speed).
 
mmenaz, i really apreciate your effort!

a) did have a cat5 cable, now it's cat5e
b) cross cable is mandatory? now i'm using "normal" cat5e cable, i thought eth1 is MDI/MDI-X
c) here are tests:
root@prxdrbd1:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: pumbg
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes



and KVM dd test
dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync
results: 9,3 sec , 115 MB/s

just with a cable!! :) awsome...

unfortunatly i have not enought room for a 10GB eth, already used by my adaptec raid, and inter motherboard chipset with hw raid, intel c602j, is not supported by debian/proxmox..so, i'm forced to stay like this...

i'm evaluating...

mmenaz
i really apreciate ur effort!!
 
and iperf results


------------------------------------------------------------
Client connecting to 10.0.0.105, TCP port 5001
TCP window size: 133 KByte (default)
------------------------------------------------------------
[ 5] local 10.0.0.106 port 50186 connected with 10.0.0.105 port 5001
[ 4] local 10.0.0.106 port 5001 connected with 10.0.0.105 port 47105
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-60.0 sec 6.53 GBytes 934 Mbits/sec
[ 4] 0.0-60.0 sec 6.38 GBytes 913 Mbits/sec



interesting :)
 
and iperf results


------------------------------------------------------------
Client connecting to 10.0.0.105, TCP port 5001
TCP window size: 133 KByte (default)
------------------------------------------------------------
[ 5] local 10.0.0.106 port 50186 connected with 10.0.0.105 port 5001
[ 4] local 10.0.0.106 port 5001 connected with 10.0.0.105 port 47105
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-60.0 sec 6.53 GBytes 934 Mbits/sec
[ 4] 0.0-60.0 sec 6.38 GBytes 913 Mbits/sec
interesting :)

Great idea:
If you can to have two NICs for use with "DRBD" connected NICs-to-NICs in bonding type balance-round robin, you will have the double of speed for transmission of data. I always have configurated in this mode for DRBD in my servers in production environments with NICs of 1 Gb/s (two NICs in bonding for each DRBD volume). In mode balance-rr until 3 NICs per DRBD Volume can help you for get the best performance, with 4 NICs the performance die. Tested with Intel PRO/1000 Pt Dual Port Server Adapter + PVE 2.3 + iperf and DRBD 8.4.2.

And if you want more performance:
1- Configure all cache of RAID Apaptec controller as Write Cache
2- HOSTs PVE must have free memory for do Read Cache
3- Disable LVM write cache on the PVE hosts (for not lose data in case of electrical failure or hangs the PVE Hosts)
4- Make tuning of DRBD, see this link:
http://www.drbd.org/users-guide/s-throughput-tuning.html
5- Use the latest version of DRBD (In theory 4 times more fast, I still have not tried, but soon will try, i have DRBD 8.4.2 in Production since 6 months ago with excelent results, my servers never switched off or hung it), in this link Mr. Lars, Head of Programming of DRBD talk about this new version:
http://blogs.linbit.com/p/469/843-random-writes-faster/

Good luck with your practices
Cesar
 
Last edited:
Great idea:
If you can to have two NICs for use with "DRBD" connected NICs-to-NICs in bonding type balance-round robin, you will have the double of speed for transmission of data. I always have configurated in this mode for DRBD in my servers in production environments with NICs of 1 Gb/s (two NICs in bonding for each DRBD volume). In mode balance-rr until 3 NICs per DRBD Volume can help you for get the best performance, with 4 NICs the performance die. Tested with Intel PRO/1000 Pt Dual Port Server Adapter + PVE 2.3 + iperf and DRBD 8.4.2.

And if you want more performance:
1- Configure all cache of RAID Apaptec controller as Write Cache
2- HOSTs PVE must have free memory for do Read Cache
3- Disable LVM write cache on the PVE hosts (for not lose data in case of electrical failure or hangs the PVE Hosts)
4- Make tuning of DRBD, see this link:
http://www.drbd.org/users-guide/s-throughput-tuning.html
5- Use the latest version of DRBD (In theory 4 times more fast, I still have not tried, but soon will try, i have DRBD 8.4.2 in Production since 6 months ago with excelent results, my servers never switched off or hung it), in this link Mr. Lars, Head of Programming of DRBD talk about this new version:
http://blogs.linbit.com/p/469/843-random-writes-faster/

Good luck with your practices
Cesar

Cesar, this is a great idea, indeed.

BUT, in my hardware configuration http://www.supermicro.com/products/system/1u/6017/sys-6017tr-tf.cfm i only have 1 (one) PCI-X slot, and is used by adaptec 6405 raid card. so no extra slot for 10gb eth.

i only have 1eth for external ip, 1eth for drbd, 1 ipmi
it only remains now to upgrade drbd, since tuning is already in place.

hope that upgrading would be easy, since i've used to first install drbd "apt-get install drbd8-utils" but on linbit website they sey i can use they repo but i must pay...so dunno :(

thank you all
Simone

p.s. what about setting protocol A? maybe this would also help more..
 
Cesar, this is a great idea, indeed.

BUT, in my hardware configuration http://www.supermicro.com/products/system/1u/6017/sys-6017tr-tf.cfm i only have 1 (one) PCI-X slot, and is used by adaptec 6405 raid card. so no extra slot for 10gb eth.

i only have 1eth for external ip, 1eth for drbd, 1 ipmi
it only remains now to upgrade drbd, since tuning is already in place.

hope that upgrading would be easy, since i've used to first install drbd "apt-get install drbd8-utils" but on linbit website they sey i can use they repo but i must pay...so dunno :(

thank you all
Simone

p.s. what about setting protocol A? maybe this would also help more..

- Do you have only 1 slot PCIe in your Computer?, because if you have more slots PCIe free, then you can use NICs intel 1 Gb/s for workstation, this cards are PCIe x1 Gen 1 and eventually you can use all slots free with this cards.

- For HDDs SATA, a excelent option is to use 2 NICs 1 Gb/s in bonding balance-rr for each DRBD volume, and not necessarily 10 Gb/s (For this example, you can get 220 MB/s of transfer of data)

- For use the latest version of DRBD in PVE, you can download since the git website, in many threads of this forum you can see this topic and will see that must be necessary download and compile since the source, please see this link:
http://pve.proxmox.com/wiki/Build_DRBD_kernel_module
(But this link talk about of a old version, will be good that you apply it with the latest version, and dkms is for that when you download new kernels version, dkms automatically will compile DRBD for these new kernels downloaded)

- Protocol A not serves for high availability, this is only used for replication to distance because isn't synchronous replication, you must use protocol C for get the security that both nodes are always with the same data
 
Hi cesarpk, thx for your email.

as i write just before, i'm tight to keep my eth nic at 1gb, i do not have the room to fit any other cards :( and since my disks are ssd, i was forced to take this decision:

1. 2 x 256GB ssd intel in raid1 with adaptec 6405 with bbu
2. 3 arrays: 1 x 20gb for proxmox installation ; 1 x 160gb for kvm ha (webserver, php) ; 1 x 40gb for percona database, not in kvm but directly on node1 but in master/slave replica with node2

in this situation, even in drbd is 100mb in write performance, my kvm machine will not need that much performance, i'm still able to migrate in real time with proxmox, and mysql database works at full 300mb write performance since installed on bare metal.

hope this is fine, even if next server i will definitly have 2 x 10gb nics, and more room for at lease 2 pcix slots.
 
Hi Simone,

Could you share your working HA Cluster/ IPMI configuration for fencing?

As you are using two Proxmox nodes with DRBD in Primary/Primary on LVM did you use a third Quorum Partition?

Thanks
 
Hi, shure, sorry for late reply.
here it is image of cluster, made from 2 supermicro server with IPMI. and yes, two nodes cluster, with drbd (2 x lvm partition on top of drbd network replicated server)

http://magento.ecommerce.mi.it/hosting/pve_x86.png

I don't have practices with IPMI, this because if the server crashed and is power off, the configuration of fence will not work, this is because the Node that is alive will not receive any answer of the module IPMI, then, the IPMI fence will not applied. For avoid this problem the best solution is to have a PDU (also know as power switch).

But if you don't want to have a PDU, you must have configured a second option of fence (fence_ack_manual), of this manner, if your first fence mode don't work (IPMI), you can apply manually the second fence mode (fence_ack_manual), and this requires add the configuration necessary to your cluster.conf file.

Best regards
Cesar
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!