DRBD Performance Problem

TheReelaatiiv · Dec 22, 2012

Hello,

i have the following setup:

Two Proxmox 2.2 nodes. On top of each exactly the same HDDs in a Software RAID 10. On top of the Software RAID i configured DRBD and on top of it LVM2.
Network Configuration looks like this:
Server 1 and Server 2 are exactly the same but the ips:
eth1 <--> Switch (Internet)
eth0 and eth2 bonded to bond0. Connected to each other with CAT 5e Network Cable.
eth0 and eth1 are Realtek NICs and eth2 is an Intel NIC.

I created two LVM LVs named "ovz-glowstone" and "ovz-bedrock" which i am mounting on the nodes "glowstone" and "bedrock":
For example on node "bedrock": On boot, activate lv ovz-bedrock and mount it to /var/container/
The same on the node "glowstone" with "ovz-glowstone".

dd with oflag=direct shows me about 30MB/s which is not as expected.

I already did this to fix the problem (but replication is still very slow):

echo 127 > /proc/sys/net/ipv4/tcp_reordering
ifconfig bond0 mtu 2000 (it does not work with 4000 because the Realtek NIC does not like this high mtu)

My DRBD configuration:

global_common.conf:

Code:

global {
    usage-count no;
    # minor-count dialog-refresh disable-ip-verification
}

common {
    protocol C;

    handlers {
        #pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        #pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        #local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
        # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
        split-brain "/usr/lib/drbd/notify-split-brain.sh root@dedilink.eu";
        out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root@dedilink.eu";
        # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
        # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
    }

    startup {
        # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
    }

    disk {
        # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
        # no-disk-drain no-md-flushes max-bio-bvecs
    }

    net {
        # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
        # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
        # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
    }

    syncer {
        # rate after al-extents use-rle cpu-mask verify-alg csums-alg
    }
}

r0.res:

Code:

resource r0 {
        protocol C;
        syncer {
                rate 2G;
    }
        startup {
                wfc-timeout 60;
                degr-wfc-timeout 60;
                become-primary-on both;
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "*****";
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
        }
        on bedrock {
                device /dev/drbd0;
                disk /dev/md0p1;
                address 10.0.0.2:7788;
                meta-disk internal;
        }
        on glowstone {
                device /dev/drbd0;
                disk /dev/md0p1;
                address 10.0.0.3:7788;
                meta-disk internal;
        }
}

To explain the storage build again:

/dev/sd[abcd] -MDRAID-> /dev/md0
/dev/md0p1 (=Linux LVM) -DRBD-> /dev/drbd0
LVM knows /dev/drbd0.

And sorry for my bad english.

e100 · Dec 22, 2012

First test the speed of your drbd replication network using iperf.

apt-get install iperf

on one node:
iperf -s

on other node run:
iperf -c ipofothernodehere -d

TheReelaatiiv · Dec 22, 2012

That seems to work:

Code:

root@glowstone ~ # iperf -c 10.0.0.2 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size:   734 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.3 port 38305 connected with 10.0.0.2 port 5001
[  5] local 10.0.0.3 port 5001 connected with 10.0.0.2 port 35842
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.37 GBytes  1.18 Gbits/sec
[  5]  0.0-10.0 sec  1.71 GBytes  1.47 Gbits/sec

But the disks are very fast:

Code:

root@glowstone ~ # hdparm -Tt /dev/sd[abcd] /dev/md0

/dev/sda:
 Timing cached reads:   24662 MB in  2.00 seconds = 12345.51 MB/sec
 Timing buffered disk reads: 524 MB in  3.00 seconds = 174.49 MB/sec

/dev/sdb:
 Timing cached reads:   24876 MB in  2.00 seconds = 12451.90 MB/sec
 Timing buffered disk reads: 528 MB in  3.01 seconds = 175.56 MB/sec

/dev/sdc:
 Timing cached reads:   24662 MB in  2.00 seconds = 12345.36 MB/sec
 Timing buffered disk reads: 526 MB in  3.01 seconds = 174.82 MB/sec

/dev/sdd:
 Timing cached reads:   24930 MB in  2.00 seconds = 12479.81 MB/sec
 Timing buffered disk reads: 518 MB in  3.03 seconds = 170.98 MB/sec

/dev/md0:
 Timing cached reads:   24966 MB in  2.00 seconds = 12497.47 MB/sec
 Timing buffered disk reads: 888 MB in  3.00 seconds = 295.65 MB/sec

What is going wrong here?

/Edit:

Tested on /dev/drbd0 and /dev/storage/ovz-glowstone it seems also to be VERY fast:

Code:

root@glowstone ~ # hdparm -Tt /dev/drbd0

/dev/drbd0:
 Timing cached reads:   25350 MB in  2.00 seconds = 12689.29 MB/sec
 Timing buffered disk reads: 946 MB in  3.00 seconds = 314.93 MB/sec
root@glowstone ~ # hdparm -Tt /dev/storage/ovz-glowstone 

/dev/storage/ovz-glowstone:
 Timing cached reads:   24186 MB in  2.00 seconds = 12106.79 MB/sec
 Timing buffered disk reads: 748 MB in  3.01 seconds = 248.33 MB/sec

So why is a dd as slow as <use an example here>?

/Edit:

Sorry, my mistake: hdparm only shows read performance.
Write performance to the LV /dev/storage/ovz-bedrock:

Code:

root@bedrock ~ # dd if=/dev/zero of=/dev/storage/ovz-bedrock bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 25.9118 s, 41.4 MB/s

Writing to /dev/drbd0 would destroy my data and my customers would not be very happy

tom · Dec 22, 2012

you use software raid which is not recommended. not for proxmox ve and not for drbd.

for more insights here, see http://fghaas.wordpress.com/2009/08/20/internal-metadata-and-why-we-recommend-it/

TheReelaatiiv · Dec 22, 2012

Isn't there any other way to solve this problem?

mir · Dec 22, 2012

Your "problem" is that you have a lot of software layers between your applications and the physical disk.

apps->
- OVZ ->
- filesystem ->
- LVM ->
- DRBD ->
- RAID 10 ->
- Disk

Each of the above layers adds complexity and gives you a performance cost. Remember that all of this is software based which means every layer competes for CPU access.
A way to improve your setup would be to replace your software RAID with a hardware RAID controller.

TheReelaatiiv · Dec 25, 2012

I don't think it's the Software-Raid Level which decreases the speed.

I "degraded" the drbd and tested the speed again:

Code:

root@glowstone /dev # dd if=/dev/zero of=/dev/storage/ovz-150 bs=5G count=1 oflag=direct
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 10.0859 s, 213 MB/s

/Edit:

The same command ("of" Parameter has another value) on an Adaptec 6405E Raid-Controller in RAID10 (the same mode as in software raid)

Code:

root@bedrock ~ # dd if=/dev/zero of=/dev/storage/ovz-bedrock bs=5G count=1 oflag=direct
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 12.3438 s, 174 MB/s

Do you see now, that Hardware-RAID isn't better than Software-RAID at the moment?

I've set Stripe Size to 1MB because i've read somewhere that a higher value is much more faster?

tom · Dec 25, 2012

If you see DRBD performance issues, I suggest the better place to get help is the DRBD user mailing list. Proberly configured, DRBD is really fast.

Search

Search

DRBD Performance Problem

TheReelaatiiv

Member

e100

Famous Member

TheReelaatiiv

Member

tom

Proxmox Staff Member

TheReelaatiiv

Member

mir

Famous Member

TheReelaatiiv

Member

tom

Proxmox Staff Member

We value your privacy