[SOLVED] Slow DRBD9 sync like 20mbit on 1GBit

tytanick

Member
Feb 25, 2013
96
3
8
Hi, i have strange issue.
All my interfaces on 2 nodes have 1GBit nic and they are connected at 1GBit in bond (active-backup) mode
I am using newest drbd9 in proxmox 4.2
Syncing is about 20mbit/s (should be like 800mbit)

Code:
root@node2:~# drbdadm status
.drbdctrl role:Primary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  node1 role:Secondary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate
  node3 role:Secondary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate
  node4 role:Secondary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate

vm-100-disk-1 role:Primary
  disk:UpToDate
  node1 role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:32.95
  node4 role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:30.78

Code:
PRC | sys    0.19s | user   0.27s |              | #proc    423 | #trun      1 | #tslpi   445 | #tslpu     0 | #zombie    0 | clones     8 |              | #exit      8 |
CPU | sys       1% | user      2% | irq       0% |              | idle   1598% | wait      0% |              | steal     0% | guest     0% | curf 2.53GHz | curscal 100% |
CPL | avg1    0.33 | avg5    0.34 |              | avg15   0.21 |              |              | csw    10090 | intr   12221 |              |              | numcpu    16 |
MEM | tot    29.4G | free   28.3G | cache 236.9M | dirty   0.2M | buff   24.9M |              | slab  109.1M |              |              |              |              |
SWP | tot     3.6G | free    3.6G |              |              |              |              |              |              |              | vmcom   1.4G | vmlim  18.3G |
LVM |     pve-root | busy      0% | read      12 | write     28 | KiB/r      4 |              | KiB/w      8 | MBr/s   0.00 | MBw/s   0.02 | avq     3.33 | avio 0.30 ms |
LVM | --disk--1_00 | busy      0% | read     771 | write      9 | KiB/r     97 |              | KiB/w      4 | MBr/s   7.32 | MBw/s   0.00 | avq     3.50 | avio 0.01 ms |
LVM | inpool_tmeta | busy      0% | read       0 | write      6 | KiB/r      0 |              | KiB/w      4 | MBr/s   0.00 | MBw/s   0.00 | avq     1.00 | avio 0.67 ms |
LVM | inpool_tdata | busy      0% | read      12 | write      9 | KiB/r      4 |              | KiB/w      4 | MBr/s   0.00 | MBw/s   0.00 | avq     6.00 | avio 0.19 ms |
LVM | inpool-tpool | busy      0% | read      12 | write      9 | KiB/r      4 |              | KiB/w      4 | MBr/s   0.00 | MBw/s   0.00 | avq     6.00 | avio 0.19 ms |
DSK |          sda | busy      0% | read      66 | write     20 | KiB/r      2 |              | KiB/w     13 | MBr/s   0.02 | MBw/s   0.03 | avq     1.00 | avio 0.19 ms |
DSK |          sdb | busy      0% | read      98 | write     15 | KiB/r      4 |              | KiB/w      4 | MBr/s   0.04 | MBw/s   0.01 | avq     2.67 | avio 0.11 ms |
NET | transport    | tcpi    5770 | tcpo   53878 | udpi     630 | udpo     575 | tcpao      0 | tcppo      0 | tcprs    191 | tcpie      0 | tcpor      0 | udpip      0 |
NET | network      | ipi     6403 | ipo     4256 | ipfrw      0 | deliv   6403 |              |              |              |              | icmpi      0 | icmpo      0 |
NET | bond0     6% | pcki    6356 | pcko   55278 | si  476 Kbps | so   65 Mbps | coll       0 | mlti      75 | erri       0 | erro       0 | drpi       2 | drpo       0 |
NET | eth0      6% | pcki    6354 | pcko   55278 | si  476 Kbps | so   65 Mbps | coll       0 | mlti      75 | erri       0 | erro       0 | drpi       0 | drpo       0 |

Code:
root@node3:~# drbdadm -V
DRBDADM_BUILDTAG=GIT-hash:\ c6e62702d5e4fb2cf6b3fa27e67cb0d4b399a30b\ debian/changelog\ debian/compat\ debian/control\ debian/control.ubuntu-precise\ debian/copyright\ debian/drbd-utils.config\ debian/drbd-utils.postrm\ debian/drbd-utils.prerm\ debian/rules\ debian/watch\ scripts/Makefile.in\ build\ by\ root@elsa\,\ 2016-04-15\ 06:37:06
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090002
DRBDADM_VERSION_CODE=0x080906
DRBDADM_VERSION=8.9.6

Code:
root@node2:~# pvecm status
Quorum information
------------------
Date:             Sun Jun 19 13:29:30 2016
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000002
Ring ID:          144
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.99.99.11
0x00000002          1 10.99.99.12 (local)
0x00000003          1 10.99.99.13
0x00000004          1 10.99.99.14

Code:
root@node2:~# hdparm -tT /dev/sdb

/dev/sdb:
Timing cached reads:   15736 MB in  2.00 seconds = 7875.08 MB/sec
Timing buffered disk reads: 680 MB in  3.00 seconds = 226.60 MB/sec

Code:
root@node2:~# cat /var/lib/drbd.d/drbdmanage_vm-100-disk-1.res
# This file was generated by drbdmanage(8), do not edit manually.
resource vm-100-disk-1 {
template-file "/var/lib/drbd.d/drbdmanage_global_common.conf";

   net {
       allow-two-primaries yes;
       shared-secret "g3+wzRwYcz/+6z6bSMRd";
       cram-hmac-alg sha1;
   }
   connection-mesh {
      hosts node1 node2 node4;
   }
    on node1 {
        node-id 0;
        address 10.99.99.11:7000;
        volume 0 {
            device minor 100;
            disk /dev/null;
            disk {
                size 26214400k;
            }
            meta-disk internal;
        }
    }
    on node2 {
        node-id 1;
        address 10.99.99.12:7000;
        volume 0 {
            device minor 100;
            disk /dev/drbdpool/vm-100-disk-1_00;
            disk {
                size 26214400k;
            }
            meta-disk internal;
        }
    }
    on node4 {
        node-id 2;
        address 10.99.99.14:7000;
        volume 0 {
            device minor 100;
            disk /dev/null;
            disk {
                size 26214400k;
            }
            meta-disk internal;
        }
    }
}


Also i see some dropped packages ???


Code:
root@node1:~# ifconfig bond0
bond0     Link encap:Ethernet  HWaddr 68:b5:99:6b:22:aa
          inet addr:10.99.99.11  Bcast:10.99.99.255  Mask:255.255.255.0
          inet6 addr: fe80::6ab5:99ff:fe6b:22aa/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:7863001 errors:0 dropped:215 overruns:0 frame:0
          TX packets:1787690 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:11447957122 (10.6 GiB)  TX bytes:1377167305 (1.2 GiB)



root@node2:~# ifconfig bond0
bond0     Link encap:Ethernet  HWaddr 68:b5:99:6c:8b:04
          inet addr:10.99.99.12  Bcast:10.99.99.255  Mask:255.255.255.0
          inet6 addr: fe80::6ab5:99ff:fe6c:8b04/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:2447779 errors:0 dropped:104 overruns:0 frame:0
          TX packets:14196852 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1432368852 (1.3 GiB)  TX bytes:20895394692 (19.4 GiB)
 
Last edited:
nope, i tried this an many other things.
Older kernel 4x still the same,
Bonding or no bonding, the same ....
 
Ok so i thought that it was buggy drbd9 but after 3 days of lost time i finally made it :)
Before that fix i had 20mbit/s syncing but after that i have 800mbit/s on 1GBit :)

Code:
NET | bond0    80% | pcki   13298 | pcko  133286 | si 3879 Kbps | so  807 Mbps | coll       0 | mlti      12 | erri       0 | erro       0 | drpi       0 | drpo       0 |
NET | eth1     80% | pcki   13298 | pcko  133286 | si 3879 Kbps | so  807 Mbps | coll       0 | mlti      12 | erri       0 | erro       0 | drpi       0 | drpo       0 |
NET | tap0      0% | pcki       4 | pcko       4 | si    0 Kbps | so   12 Kbps | coll       0 | mlti       0 | erri       0 | erro       0 | drpi       0 | drpo       0 |
NET | eth2

it was all about stupid config file ..... and accually you need to edit proper config ....

First of all if you have working drbd9 cluster first stop all VMs and then all resources on both drbd nodes:
Code:
drbdadm disconnect all

Then you need to edit config (and this is what i was looking for so long) /var/lib/drbd.d/drbdmanage_global_common.conf because editing /etc/drbd.d/global_common.conf doesnt work at all :)

Enter those settings to /var/lib/drbd.d/drbdmanage_global_common.conf ON BOTH NODES
Code:
# it must be content of /var/lib/drbd.d/drbdmanage_global_common.conf !!!!
common {
disk {
        on-io-error             detach;
        no-disk-flushes ;
        no-disk-barrier;
        c-plan-ahead 10;
        c-fill-target 24M;
        c-min-rate 10M;
        c-max-rate 100M;
}
net {
        # max-epoch-size          20000;
        max-buffers             36k;
        sndbuf-size            1024k ;
        rcvbuf-size            2048k;
}
}

Then simply restart drbd by:
Code:
/etc/init.d/drbd restart

And thats it. Syncing should be at 800mbits :)
In you have 10GBit just change c-max-rate to 1000M
Play with those settings :)
 
Last edited:
Not completly solved.....
This file
/var/lib/drbd.d/drbdmanage_global_common.conf
after reboot is rewritten with empy.
Where can i add tthose settings to make it pernament ?
 
also is this right ? On ssd so much busy ?
Something is wrong i suppose.

LVM | --disk--1_00 | busy 100% | read 15 | write 14005 | KiB/r 4 | | KiB/w 3 | MBr/s 0.01 | MBw/s 5.47 | avq 655.16 | avio 0.71 ms |
LVM | inpool_tdata | busy 91% | read 49 | write 14459 | KiB/r 4 | | KiB/w 7 | MBr/s 0.02 | MBw/s 10.78 | avq 128.43 | avio 0.62 ms |
LVM | inpool-tpool | busy 91% | read 49 | write 14459 | KiB/r 4 | | KiB/w 7 | MBr/s 0.02 | MBw/s 10.78 | avq 128.43 | avio 0.62 ms |
LVM | inpool_tmeta | busy 25% | read 0 | write 206 | KiB/r 0 | | KiB/w 4 | MBr/s 0.00 | MBw/s 0.08 | avq 6.37 | avio 12.3 ms |
LVM | pve-root | busy 8% | read 12 | write 13 | KiB/r 4 | | KiB/w 5 | MBr/s 0.00 | MBw/s 0.01 | avq 1.54 | avio 31.7 ms |
DSK | sdb | busy 100% | read 132 | write 13964 | KiB/r 4 | | KiB/w 7 | MBr/s 0.05 | MBw/s 10.72 | avq 110.92 | avio 0.71 ms |
DSK | sda | busy 11% | read 60 | write 13 | KiB/r 2 | | KiB/w 6 | MBr/s 0.01 | MBw/s 0.01 | avq 1.21 | avio 15.2 ms |
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!