1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

KVM on top of DRBD and out of sync: long term investigation results

Discussion in 'Proxmox VE: Installation and configuration' started by giner, Apr 7, 2014.

  1. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    > A) if that you say is correct, why when i do the things in this order the VM start more faster?
    I'm not sure if it is tunable.

    > B) Why I don't get to have OOS in DRDB when the verification of replicated volumes to finished? (this test is run a time for week automatically)
    Depends on OS type (Windows? Linux?), OS configuration (Write cache without barriers? Swap partition?), and OS usage (does it use swap or it is free most of the time?).
     
  2. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
  3. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    14
    I am not an IO expert nor do I know everything there is to know about IO, but this is what I believe is happening and why it makes a difference.

    With cache=none this can happen:
    1. Guest issues write
    2. DRBD sends write to remote host
    3. Guest modifies the write (it has not issued sync yet)
    4. DRBD writes the modified block locally
    Now you are out of sync

    With cache=directsync it works like this:
    1. Guest issues write
    2. DRBD sends write to remote host and local disks
    3. Guest modifies the write (has not issued sync yet)
    4. DRBD sends write to remote host and local disks

    O_DSYNC is the difference here
    With O_DSYNC each write is flushed and written the moment it happens thus not allowing any window of opprotunity for the guest to modify the write.

    Only directsync and writethrough offer O_DSYNC.
     
  4. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Hi e100

    In this link of PVE "Performance Tweaks":
    https://pve.proxmox.com/wiki/Performance_Tweaks

    I see that literally say:

    some interestings articles :
    barriers : http://monolight.cc/2011/06/barriers-caches-filesystems/
    cache mode and fsync : http://www.ilsistemista.net/index.p...s-on-red-hat-enterprise-linux-62.html?start=2

    And the Website of "cache mode and fsync", i see a graph explanatory, and below literally say:


    • if the cache policy is set to “writeback”, data will be cached in the host-side pagecache (red arrow);
    • if the cache policy is set to “none”, data will be immediately flushed to the physical disk/controller cache (gray arrow);
    • if the cache policy is set to “writethrough”, data will be immediately flushed to the physical disk platters (blue arrow).

    But you said that with "none", i will have write cache in the VM, right?

    If your answer is positive, i understand that the information of this Website is wrong and contradicts the behavior, right?
     
    #24 cesarpk, Apr 9, 2014
    Last edited: Apr 9, 2014
  5. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    14
    The SUSE page you linked too really shows the differences:
    none = O_DIRECT
    writethrough = O_DSYNC
    directsync = O_DIRECT + O_DSYNC

    With O_DIRECT data is copied to the IO device directly from the user-space buffer bypassing the cache. It does not guarantee that the operation is synchronous.
    With O_DSYNC the data is written synchronously.
    Combine the two and you bypass the buffer cache and perform synchronous IO. ( The most safest form of IO possible AFAIK )
     
    #25 e100, Apr 9, 2014
    Last edited: Apr 9, 2014
  6. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
  7. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Hi e100, but now i am very confused about of directsync:

    Why do you think that is the most safest form of I/O possible?.
    If with O_DIRECT the VM need to do fsync, why directsync use both technologies?, and what is the advantage?
     
    #27 cesarpk, Apr 9, 2014
    Last edited: Apr 9, 2014
  8. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    14
    Directsync gets the data to the disk synchronously and since it bypasses the host buffer cache should get the data to the disk faster than writethrough.

    Less time for data to get lost and the guest not getting confirmation of the write until the data is on permanent storage seems like the safest method possible.

    Guest OS can(does) implement its own read cache and benefit by using writethrough IO itself. It makes no sense and provides no benefit to have a read cache of the same block data within the host AND the guest. Caching the data twice just wastes RAM, RAM bandwidth and CPU cycles. If read cache is needed add more RAM to the guest so it can utilize it for cache.
    That is why I am switching to directsync.

    Writethrough has never shown any benefit whenever I have tested it and is often slower reading and writing than cache=none or directsync. But I have only tested on high performance IO systems it might be beneficial on slower IO subsystems but I doubt it. Allowing the guest to cache block data using the RAM allocated to it is likely to provide the best IO performance in most situations.
     
  9. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Thanks e100 for your answer, but in the PVE Wiki ( https://pve.proxmox.com/wiki/Performance_Tweaks ) says:

    cache=writethrough
    - host do read cache
    - guest disk cache mode is writethrough
    - Writethrough make a fsync for each write. So it's the more secure cache mode, you can't loose data. It's also the slower.

    And don't tell us about of the read cache.

    In this link of Suse ( https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.html ), if you read the documentation of Writethrough, says about of the write cache but don't tell us nothing about of the read cache. Then, why do you say that with writethrough we have two read cache? (i understand that the VM have his own read cache)

    Many thanks for share your knowledge and experiences with us (to me, are worth gold)

    Best regards
    Cesar
     
    #29 cesarpk, Apr 10, 2014
    Last edited: Apr 10, 2014
  10. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    14
    The whole point of writethrough is to cache the writes in ram so if you read the blocks you just wrote the data can come from ram instead of having to read from the disk.

    If the guest and host are both caching data it will get cached twice.
     
  11. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
  12. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    14
    Spirits suggestion of cache=none is not helpful here, giner reports that when he used cache=none he got inconsistencies on DRBD.
    Only cache=writethrough or cache=directsync did not cause inconcistencies.

    He also suggests for read performance to add more RAM to the VM, same suggestion I have made here.
    Let the guest do read cache, not the host. That rules out using writethrough leaving only one solution, directsync.
     
  13. mir

    mir Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 14, 2012
    Messages:
    3,351
    Likes Received:
    77
    My observation are also that cache=none or cache=directsync provides best performance if your storage layer has a decent cache. Eg. dont use cache settings which involves host cache. If you use host cache you will not use your controllers cache.
     
  14. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Many thanks e100, directsync will be my best option if i have a RAID controller with WB and BBU enabled.

    But if i have a single SATA HDD for use it with DRBD and without a RAID controller in the middle, and i want that the Host/VM don't use the RAM as write cache, and that deem that the data are written in the disk when really are on the buffer of DRBD, how i should configure DRBD and the cache of the VM?

    Best regards
    Cesar
     
    #34 cesarpk, Apr 14, 2014
    Last edited: Apr 14, 2014
  15. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    I suppose the best option here is writeback with barriers. So, if my VM was Linux I would create one virtual HDD with ext4 (barriers are enabled by default) and another HDD with swap partition. The first HDD can be attached with cache=writeback and the second one must be attached with directsync/writethrough. At the same time we have to be sure that all layers between VM and physical drive support barriers: VM, LVM (if used), DRBD, physical drive, something else (if used).
     
    #35 giner, Apr 14, 2014
    Last edited: Apr 14, 2014
  16. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Many thanks giner for your answer, but i have some doubts, if you see this link:
    http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=/liaat/liaatbpkvmguestcache.htm

    You will see that says:

    • none With caching mode set to none, the host page cache is disabled, but the disk write cache is enabled for the guest. In this mode, the write performance in the guest is optimal because write operations bypass the host page cache and go directly to the disk write cache. If the disk write cache is battery-backed, or if the applications or storage stack in the guest transfer data properly (either through fsync operations or file system barriers), then data integrity can be ensured. However, because the host page cache is disabled, the read performance in the guest would not be as good as in the modes where the host page cache is enabled, such as writethrough mode.

    Then, i understand that:
    disk write cache = buffer of the RAID controller or buffer of the HDD
    host page cache = buffer on Host RAM

    On the other hand, i understand that:
    1- We can get the conclusion that the writes go to the buffer of the HDD, and as this buffer is a lower layer to DRBD, obviously the data always will be replicated
    2- And this web link tell us literally about of cache=none : "If the disk write cache is battery-backed, or if the applications or storage stack in the guest transfer data properly (either through fsync operations or file system barriers), then data integrity can be ensured.", I understand that the data will be guaranteed in the same Host thanks to the BBU.
    3- For this cause, i never get a "OOS" of DRBD in my verification of DRBD volumes that is executed automatically once a week.

    If i missed of anything, please give your comments about of my errors.

    Best regards
    Cesar
     
  17. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    Cesar,

    Sorry, I didn't get your point. What is the question?

    Best regards,
    Stanislav
     
  18. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    769
    Likes Received:
    2
    Excuse me please, i believe that "cache=none" is a good configuration for the DRBD volumes, if i am wrong, please let me to know it.

    Best regards
    cesar

    Re edited: Always based on the report of IBM in the link that i put in the previous post
     
    #38 cesarpk, Apr 14, 2014
    Last edited: Apr 14, 2014
  19. giner

    giner Member

    Joined:
    Oct 14, 2009
    Messages:
    237
    Likes Received:
    0
    1. You asked about configuration when we have a single HDD without RAID, so I suggested.
    2. Any modes except directsync and writethrough shouldn't be used if we can't be sure that upper layer wouldn't try to submit new date before old data is committed. So ext4 with barriers enabled will work with any of caching modes without any problems but swap partition won't.
     
  20. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    14
    Having swap inconsistent would be bad for live migrations.

    Cesar, there are two issues here:
    1. Data safety/consistency
    2. Performance

    To ensure safety/consistency, when using DRBD, the only options you can use are directsync or writethrough.
    I believe that performance will be best with directsync no matter what the storage is. ( single Dusk, RAID array or whatever )
    However I have only performed benchmarks on RAID arrays.

    If someone wants to benefIt from read cache, in my experience, the best performance will be if the guest does the cache not the host.
     

Share This Page