1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

KVM on top of DRBD and out of sync: long term investigation results

Discussion in 'Proxmox VE: Installation and configuration' started by giner, Apr 7, 2014.

  1. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    e100, many thanks, i believe the same, and if we have doubt, we only need do the verification of DRBD twice, the first time with the secret phrase and encryption enabled and see the time that need DRBD for complete the task, and second with... ¿?

    Best regards
    Cesar
     
    #61 cesarpk, Nov 29, 2014
    Last edited: Nov 29, 2014
  2. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    Many thanks Cesar...are this settings fast enough for my 10GBE drbd-backlink connection? Or could I use higher values? (resnyc-rate, c-mac-rate, etc.)
     
  3. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    I don't know, it depend of your max write speed in the storage and the max speed of your network DRBD. I think that if you have NICs of 10 Gb/s, the storage will be more slow.

    For do such test, I have as habit always configure within of DRBD the speed of write to values super high for the first sync, "AND ALWAYS IN STATIC MODE" (that it is a little more high that the max speed of my DRBD network link), then when the first DRBD sync is in progress, i can see the max speed of write sustained that supports my setup (hardware, software, all my systems in general).

    For example, if i have configured DRBD with static speed write to the max value, and my max speed write sustained was of 100 MB/s, then my DRBD setup in dynamic mode will have finally a "c-max-rate 80M;" configured (the 15% or 20% minus that the max speed write sustained in the test), and as value of minimal speed sync "c-min-rate 33M;" (the 33% of the max speed write sustained of the test).

    I hope that this information can be util for you.

    Best regards
    Cesar
     
    #63 cesarpk, Nov 30, 2014
    Last edited: Nov 30, 2014
  4. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    Hi There,

    I have some updates regarding the issue.
    1) Upgrading to the latest DRBD 8.4 does not help and the issue is reproducible easily
    2) I most cases directsync prevented out of sync for me however I've got a single VM where directsync did not help. However (!), switching to writethrough helped. This means that 'writethrough' is the only one mode that does not produce out of sync blocks for me so far. Here is a VM config that produces out of sync with directsync:
    # Installed OS Windows 7 Enterprise SP1 with all updates
    boot: c
    bootdisk: ide0
    cores: 1
    ide0: drbd-lvm-0:vm-123-disk-1,cache=directsync,backup=no,size=40G
    ide2: none,media=cdrom
    memory: 512
    name: zelpc00000001
    net0: rtl8139=AE:8E:CA:56:D8:BD,bridge=vmbr2
    onboot: 1
    ostype: wxp
    sockets: 1

    Best regards,
    Stanislav
     
  5. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    Hi all,

    I also still have some OOS Probs after booting one of the proxmox hosts...I also switched back to default(no cache)...but with writethrough these probs where there. Now I have the latest updates with kernel 3.10-8 and pve-qemu-2.2...perhaps this is the cure :)

    mac
     
  6. RobFantini

    RobFantini Active Member
    Proxmox VE Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,282
    Likes Received:
    9
    Hello macday. How long have you been running that setup ?
     
  7. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    Hi Rob,

    do you mean how long until the oos occurs? about 50 days. for now i have to reboot my cluster every month. after reboot the oos will be resynced (about 1,5gb per drbd ressource). after reboot it changes from inconsistent to uptodate. but i need a solution without rebooting and bringing the drbd down. i have a primary primary setup with two different raids. one drbd is sas raid and on drbd is nearline sas. i already tuned some settings to sync to disk more often.
     
  8. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    Hi There,

    I have some updates regarding the issue.

    Recent tests shows that all cache modes with O_DIRECT produce out of sync blocks and modes without O_DIRECT do not produce out of sync blocks. So both writethrough and writeback can be considered safe in relation to this issue.

    Best regards,
    Stanislav
     
  9. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    Thanks Stanislav,

    so should I try writeback mode with a fast raid-controller which also has writeback with bbu ?

    Best regards,
    Mac
     
  10. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    Generally writeback cache mode for qemu is not safe and can cause data loss on power failure. My previous message was only related to DRBD and out of sync blocks.
     
  11. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    I have a drbd-cluster. And I know about the unsafeness of writeback mode.
     
  12. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    hi @ll
    just a quick update...no more oos after 30 days of uptime with heavy writes and cache mode none on all vms...i think i found a good trick on my setup...do a sync and flush the page cache once a day (in the morning)

    let me hear what you think about it.

    cheers mac
     
  13. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    i shouted out too early :) still had oos´s but the resync is fine after reboot. what about drbd9 any experiences yet ?
     
  14. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
  15. mir

    mir Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 14, 2012
    Messages:
    3,383
    Likes Received:
    81
    I wonder if using scsi-disk and virtio-scsi will make a difference?
     
    #75 mir, May 27, 2015
    Last edited: May 27, 2015
  16. macday

    macday Member

    Joined:
    Mar 10, 2010
    Messages:
    408
    Likes Received:
    0
    Nope sorry, I already tried that :)
     
  17. RobFantini

    RobFantini Active Member
    Proxmox VE Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,282
    Likes Received:
    9
    Stanislav : On our production systems we are using drbd on zfs , with a zvol for each kvm . cache is set to writeback . In your post to pve-dev you mention " - if block device is open without O_DIRECT - the issue never happen " . Is there a cache setting in pve to prevent the issue? PS: we've had vm stability issues using drbd , ceph and non shared storage. system time always goes out of sync when vm's are unstable. So I use a script to test how the time is doing on data entry systems . If time is off key users get a red alert email. the stability issues here occur when there is heavy network traffic. like backups on multiple systems to nfs. So I wonder if this issue here is with any storage - not just drbd? Sometime soon I'll test the script you posted to see if the issue will occur with zfs zvols .
     
  18. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Hi macday

    Still you are having problems OOS in DRBD?
    ... maybe i can help you
     

Share This Page