1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

KVM on top of DRBD and out of sync: long term investigation results

Discussion in 'Proxmox VE: Installation and configuration' started by giner, Apr 7, 2014.

  1. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    It is possible to use different modes for different virtual drives so we can move swap to another drive.

    Could you share your benchmark results?
     
  2. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    17
    directsync does NOT prevent this problem.

    I've recently noticed a larger number of inconsistencies, on upgraded faster hardware, so I enabled data-integrity-alg on a couple of nodes.
    Few hours later DRBD breaks and split-brains due to the error:
    buffer modified by upper layers during write

    This is the precise problem that giner described in the first post.
    All of my VMs have directsync set for all of their VM disks, so it is obvious that directsync does NOT prevent the buffer from being modified while DRBD writes are in flight.

    writethrough, in my experience, does not perform well so I am not sure what I will do to prevent this problem.

    I wonder if DRBD 8.4 in the 3.10 kernel is less susceptible to this problem.

    For reference the cause of this problem is explained by Lars, one of the main DRBD developers, here:
    http://lists.linbit.com/pipermail/drbd-user/2014-February/020607.html
     
  3. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    This is very unusual. I've never experienced this problem since I switched to cache=writethrough.
    Can you try to reproduce the issue with data-integrity-alg disabled? In theory data-integrity-alg can catch false-positive results.

    Stanislav
     
  4. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    I have also still the same problem. Mostly with slower Storage (near-line-sas). And I have the 3.10 kernel. In my case I´m trying now to switch to writethrough and lower the al-extents on the slower storage because after the split-brain thereis alot of resync to do. The strange thing is, on the same host the superfast SAS-Storage on the same Raid-Controller which is more frequented does not have any problems.
     
  5. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    17
    Yes, we are seeing lots of out of sync blocks when we run a verify.
    I only enabled data-integrit-alg to confirm that it is being caused by the buffer being modified while DRBD writes are in flight.

    Testing if writethrough prevents this problem is now on my todo list.
     
  6. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Hi to all

    Long time ago i was talking with Lars, and he tell me that "data-integrit-alg" only should use for purposes of test and never in a production enviroment due that typically exist modifications of the network packets (upper layers in my case refers to the Hardware into the same PC due to that i use DRBD in mode NIC-to-NIC), and since that i erased such directive (data-integrit-alg), my problems are over.

    Moreover, always is better this practices:
    1) DRDB in connections NIC-to-NIC (i use much bonding "balance-rr" and "jumbo-frames" for duplicate the speed of connection)
    2) Don't use a password for the replication (in a connection NIC-to-NIC nobody can see the transmition), and also you get more speed for the replication and less use of processor.
    3) LVM on top of DRBD is the best way of get speed of access disk.
    4) PVE host should have as I/O Scheduler "deadline", while a Linux guest should have as I/O Scheduler "noop" (no optimized)
    5- Virtio-block as disk driver in the guest (maybe no all know).
    6- In my particular case, i use DRBD 8.4.4 version since many time ago, and soon i will have 8.4.5 in other PVE servers, and i never had problems considering that verifications automatically of the DRBD storages are executed a time by week (i believe that the latest versions of DRBD are better - less bugs y better optimizations), and i always hear to Lars say it to many people this same.

    Best regards
    Cesar
     
  7. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    Hi Cesar,

    thanks for your post. Could you help me? Here is one of my ressource.conf:
    drbdadm dump

    Now my question: Is verify-alg similar to data-integrit-alg ? How can I tune my config ?

    Thanks in advance
    Mac
     
    #47 macday, Nov 23, 2014
    Last edited: Nov 23, 2014
  8. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Hi macday

    This are my suggestions:

    1) Read all in this link - DRBD tuning recomendations (including the next web pages of the same topic of tuning recomendations, these recommendations should be your bible):
    http://www.drbd.org/users-guide/s-throughput-tuning.html

    At first glance, I would correct this:
    Notes:
    a) I know it because i have read all about DRBD tuning recommendations)
    b) I presume that you are using DRBD 8.4.x version
    c) About of the syntax, i don't remember the exact syntax , so it do not will be added in my suggestions

    2) Where it says: sndbuf-size 512k;
    It should say: sndbuf-size 0k; (DRBD will calculate and will change it automatically as will be necessary)

    3) Where it says: max-buffers 128k; max-epoch-size 8000;
    It should say: max-buffers 8000; max-epoch-size 8000; (DRBD recommend that these values be equals)

    4) Where it says: disk-barrier no; disk-flushes no; md-flushes no;
    It should say: disk-flushes no; md-flushes no; (About of disk-barrier, will be better remove it, and about of "disk-flushes no;" "md-flushes no;", these options are good only when you have a Hardware RAID controller with BBU (backet battery unit), the target is don't lose information if the server suddenly turns off or the VM crashes.

    5) Where it says: options { ...
    It should say: cpu-mask 0; (for that DRBD use all cores and threads of your processors)

    6) Where it says:eek:n-no-data-accessible suspend-io;
    It should say: on-no-data-accessible io-error; (that is the default), but if you add in "disk { on-io-error detach }", in case of failure of disk, DRRB will do the reads and writes of disks in the other node, so your VM will continue working without problems in the same node (maybe the speed access of disk will be less)

    7) Where it says: "cram-hmac-alg" and "shared-secret"
    It should say: don't use such directives, and remember that the DRBD connections must be: NIC-to-NIC without a switch in the middle (you will have more speed of DRBD link, moreover, if your NIC cards accept configurations of mtu (jumbo-frames), will be better change it to the max value that the NIC supports.

    8) Where it says:allow-two-primaries yes;
    It should say: i guess that only must say: allow-two-primaries;

    9) Where it says:disk-flushes no; disk-barrier no;
    It should say: i am not sure that is the correct syntax, please verify with the documentation of your version

    Suggestions miscellaneous:
    10) Use only NIC or NICs in dedicated mode for DRBD
    11) Enable these options for obtain alarms immediate by email:
    split-brain "/usr/lib/drbd/notify-split-brain.sh some@emailAddress.com";
    out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh some@emailAddress.com";
    (not necessarily must be the root user the receiver of these messages, but you will should check that works)

    Always that you can do it, you use bonding in PVE with balance-rr for use exclusive of DRBD, and better with two NICs equal, then you will get these advantages (that can configured it by the PVE GUI, but the "mtu" only after that you verify that balance-rr works well, and only can configure it by CLI):
    12) Duplicate the speed of your DRBD link
    13) If only a NIC of your bonding balance-rr is disconnected, or only a NIC decomposes, DRBD will continue to working, but with less speed in the network communication (and obviously for the VM, with less speed access to disk).



    Very luck with your DRBD configurations

    Best regards
    Cesar
     
  9. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    17
    The authentication happens only once during the initial connection not on every single request, it does not impact performance.
    Turning it off will not make DRBD faster but having it enabled makes your data a little safer.

    I can see maybe not using it if you are NIC-to-NIC, but if you are using a switch then authentication should be enabled.
     
  10. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Hi e100, is a pleasure to greet you again (you are the great master of masters)

    Many thanks for the clarification, and maybe i am wrong, but as i understand, if we have a shared key in DRBD, any transmission will be encrypted. If it no bothers, do you have a web link that speaks about of this topic.

    About of DRBD and the authentication when is used a Switch in the middle, i am in accordance with you, and if is possible avoid use a switch will be better for no have other single point of failure.

    I take leave wishing you enjoy more successes of than those already enjoys.

    Best regards
    Cesar
     
    #50 cesarpk, Nov 24, 2014
    Last edited: Nov 24, 2014
  11. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    17

    Source:
    http://www.drbd.org/users-guide/re-drbdconf.html

    I've been using DRBD since around 2005, I remember reading that the auth is performed only once in an example drbd.conf.
    The only references I can find are old example drbd.conf files, search google for:
    "Authentication is only done once" DRBD

    Maybe thats not true anymore but it sure would be silly and inefficient to perform the auth on every single request.
     
  12. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    Hello,

    Just a remark: such a big resync-rate can cause I/O starvation and big issues afterwards.

    Stanislav
     
  13. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    Thanks Stanislav...but what count should i use on 10GBE NIC to NIC Connection ?
     
  14. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    This is more about I/O subsystem then about network. I would choose not more than 1/3 or even 1/4 of maximum possible I/O throughput.
     
  15. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    @macday:
    The DRBD team recommends for use the synchronization speed, the third part of the disk speed or the NIC, whichever is slower of both, of this manner, your replication system will use all the rest possible of bandwidth and speed disk.
    Please see this link:
    http://www.drbd.org/users-guide/s-configure-sync-rate.html
     
  16. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    @macday:
    AH ..... i forgot tell you that with 8.4.x versions of DRBD, these systems have some options for that in dynamic form change the speed of replication and synchronization in the fly, the target is obtain more speed of replication when is necessary, and obtain more speed of synchronization when the speed of replication is not necessary. A requisites for get such target is configure the max and min value of the synchronization system.
     
  17. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    thanks to all...do you have an ideal config file structure and content (drbd common and ressources)?
     
  18. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    It depends largely on the version you are using.
     
  19. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    I´m using Kernel 3.10.5-pve with the DRDB-Tools and Modules 8.4.3..thx
     
  20. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Also should be careful of avoid the loss of data, then I use this strategy for obtain better performance and avoid the loss data:

    A) The PVE server with much RAM for assign to the VM, and as the OS of the VM can to do read cache, then there will be minus activity of disk and in general terms means that the VM will run more fast.

    B) Into KVM, i use "direct sync" for the configuration of the virtual disk, that it does not disk cache, and as the VM if are doing, then we don't need two cache systems (wasting RAM).

    C) Use "deadline" as the "I/O scheduler" for your PVE servers, and "noop" for your Linux guests (recomendation of IBM).

    D) Today i start with this option for prove the results (don't do it if your system have little RAM):
    - Configure the OS of PVE for that don't use swap disk: vm.swappiness=0

    About of DRBD:
    It depend if you have a RAID controller in writeback mode with BBU configured or not in your PVE server, for do the configurations of disk in DRBD, that technically speaking, it comes to enable or disable these options of DRBD:

    disk { ...
    disk-flushes; md-flushes; disk-barrier no; #for HDDs or Hardware RAID controllers without the writeback mode enabled in his logical disks.
    disk-flushes no; md-flushes no; disk-barrier no; #only for Hardware RAID controllers with the writeback mode enabled in his logical disks.
    ... }

    Almost never my configuration is equal of a server to other, due to that i modify the configuration of DRBD depending of the RAID controller configuration (if the server have it), type of array, if his cache are enabled or disabled for use it with his virtual disks, the disks speed , and the network speed.

    Here i send you a example of DRBD configuration for a "SATA" or "Near line SAS" of 7.2k RPM (In my case connected to a Hardware RAID Controller in RAID-1, but without any cache enable for this virtual disk, that is almost the same that don't have a RAID controller speaking in terms of speed access)

    The "global_common.conf" file:
    Code:
    global {
        usage-count no;
    }
    common {
        handlers {
             split-brain "/usr/lib/drbd/notify-split-brain.sh root";
             out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
    }
    startup {
            wfc-timeout 30; degr-wfc-timeout 20; outdated-wfc-timeout 15;
    }
        options {
            cpu-mask 0;
        }
        disk {
        }
        net {
            sndbuf-size 0;  unplug-watermark 16; max-buffers 8000; max-epoch-size 8000; verify-alg sha1;
        }
    }
    
    The "r1.res" file:
    Code:
    
    
     resource r1 {
        protocol C;
        startup {
            become-primary-on both;
        }
        disk {
            on-io-error detach; al-extents 1801; resync-rate 25M; c-plan-ahead 20; c-min-rate 25M; c-max-rate 60M; c-fill-target 128k;
            disk-flushes; md-flushes; disk-barrier no;
        }
        net {
            allow-two-primaries;
            after-sb-0pri discard-zero-changes;
            after-sb-1pri discard-secondary;
            after-sb-2pri disconnect;
        }
        volume 11 {
            device /dev/drbd11;
            disk /dev/sdb1;
            meta-disk internal;
        }
        on pve5 {
            address 10.1.1.50:7788;
        }
        on pve6 {
            address 10.1.1.51:7788;
        }
    }
    
    I hope this information helps you.

    Best regards
    Cesar

    Re-edited: Maybe you want to see this web link:
    http://blogs.linbit.com/p/469/843-random-writes-faster/
     
    #60 cesarpk, Nov 29, 2014
    Last edited: Nov 29, 2014

Share This Page