KVM on top of DRBD and out of sync: long term investigation results

> A) if that you say is correct, why when i do the things in this order the VM start more faster?
I'm not sure if it is tunable.

> B) Why I don't get to have OOS in DRDB when the verification of replicated volumes to finished? (this test is run a time for week automatically)
Depends on OS type (Windows? Linux?), OS configuration (Write cache without barriers? Swap partition?), and OS usage (does it use swap or it is free most of the time?).
 
I am not an IO expert nor do I know everything there is to know about IO, but this is what I believe is happening and why it makes a difference.

With cache=none this can happen:
1. Guest issues write
2. DRBD sends write to remote host
3. Guest modifies the write (it has not issued sync yet)
4. DRBD writes the modified block locally
Now you are out of sync

With cache=directsync it works like this:
1. Guest issues write
2. DRBD sends write to remote host and local disks
3. Guest modifies the write (has not issued sync yet)
4. DRBD sends write to remote host and local disks

O_DSYNC is the difference here
With O_DSYNC each write is flushed and written the moment it happens thus not allowing any window of opprotunity for the guest to modify the write.

Only directsync and writethrough offer O_DSYNC.
 
Hi e100

In this link of PVE "Performance Tweaks":
https://pve.proxmox.com/wiki/Performance_Tweaks

I see that literally say:

some interestings articles :
barriers : http://monolight.cc/2011/06/barriers-caches-filesystems/
cache mode and fsync : http://www.ilsistemista.net/index.p...s-on-red-hat-enterprise-linux-62.html?start=2

And the Website of "cache mode and fsync", i see a graph explanatory, and below literally say:


  • if the cache policy is set to “writeback”, data will be cached in the host-side pagecache (red arrow);
  • if the cache policy is set to “none”, data will be immediately flushed to the physical disk/controller cache (gray arrow);
  • if the cache policy is set to “writethrough”, data will be immediately flushed to the physical disk platters (blue arrow).

But you said that with "none", i will have write cache in the VM, right?

If your answer is positive, i understand that the information of this Website is wrong and contradicts the behavior, right?
 
Last edited:
The SUSE page you linked too really shows the differences:
none = O_DIRECT
writethrough = O_DSYNC
directsync = O_DIRECT + O_DSYNC

With O_DIRECT data is copied to the IO device directly from the user-space buffer bypassing the cache. It does not guarantee that the operation is synchronous.
With O_DSYNC the data is written synchronously.
Combine the two and you bypass the buffer cache and perform synchronous IO. ( The most safest form of IO possible AFAIK )
 
Last edited:
The SUSE page you linked too really shows the differences:
none = O_DIRECT
writethrough = O_DSYNC
directsync = O_DIRECT + O_DSYNC

With O_DIRECT data is copied to the IO device directly from the user-space buffer bypassing the cache. It does not guarantee that the operation is synchronous.
With O_DSYNC the data is written synchronously.
Combine the two and you bypass the buffer cache and perform synchronous IO. ( The most safest form of IO possible AFAIK )

Hi e100, but now i am very confused about of directsync:

Why do you think that is the most safest form of I/O possible?.
If with O_DIRECT the VM need to do fsync, why directsync use both technologies?, and what is the advantage?
 
Last edited:
Directsync gets the data to the disk synchronously and since it bypasses the host buffer cache should get the data to the disk faster than writethrough.

Less time for data to get lost and the guest not getting confirmation of the write until the data is on permanent storage seems like the safest method possible.

Guest OS can(does) implement its own read cache and benefit by using writethrough IO itself. It makes no sense and provides no benefit to have a read cache of the same block data within the host AND the guest. Caching the data twice just wastes RAM, RAM bandwidth and CPU cycles. If read cache is needed add more RAM to the guest so it can utilize it for cache.
That is why I am switching to directsync.

Writethrough has never shown any benefit whenever I have tested it and is often slower reading and writing than cache=none or directsync. But I have only tested on high performance IO systems it might be beneficial on slower IO subsystems but I doubt it. Allowing the guest to cache block data using the RAM allocated to it is likely to provide the best IO performance in most situations.
 
Guest OS can(does) implement its own read cache and benefit by using writethrough IO itself. It makes no sense and provides no benefit to have a read cache of the same block data within the host AND the guest. Caching the data twice just wastes RAM, RAM bandwidth and CPU cycles. If read cache is needed add more RAM to the guest so it can utilize it for cache.
That is why I am switching to directsync.
Thanks e100 for your answer, but in the PVE Wiki ( https://pve.proxmox.com/wiki/Performance_Tweaks ) says:

cache=writethrough
- host do read cache
- guest disk cache mode is writethrough
- Writethrough make a fsync for each write. So it's the more secure cache mode, you can't loose data. It's also the slower.

And don't tell us about of the read cache.

In this link of Suse ( https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.html ), if you read the documentation of Writethrough, says about of the write cache but don't tell us nothing about of the read cache. Then, why do you say that with writethrough we have two read cache? (i understand that the VM have his own read cache)

Many thanks for share your knowledge and experiences with us (to me, are worth gold)

Best regards
Cesar
 
Last edited:
Spirits suggestion of cache=none is not helpful here, giner reports that when he used cache=none he got inconsistencies on DRBD.
Only cache=writethrough or cache=directsync did not cause inconcistencies.

He also suggests for read performance to add more RAM to the VM, same suggestion I have made here.
Let the guest do read cache, not the host. That rules out using writethrough leaving only one solution, directsync.
 
My observation are also that cache=none or cache=directsync provides best performance if your storage layer has a decent cache. Eg. dont use cache settings which involves host cache. If you use host cache you will not use your controllers cache.
 
Spirits suggestion of cache=none is not helpful here, giner reports that when he used cache=none he got inconsistencies on DRBD.
Only cache=writethrough or cache=directsync did not cause inconcistencies.

He also suggests for read performance to add more RAM to the VM, same suggestion I have made here.
Let the guest do read cache, not the host. That rules out using writethrough leaving only one solution, directsync.

Many thanks e100, directsync will be my best option if i have a RAID controller with WB and BBU enabled.

But if i have a single SATA HDD for use it with DRBD and without a RAID controller in the middle, and i want that the Host/VM don't use the RAM as write cache, and that deem that the data are written in the disk when really are on the buffer of DRBD, how i should configure DRBD and the cache of the VM?

Best regards
Cesar
 
Last edited:
Many thanks e100, directsync will be my best option if i have a RAID controller with WB and BBU enabled.

But if i have a single SATA HDD for use it with DRBD and without a RAID controller in the middle, and i want that the VM don't use the RAM as write cache, and that deem that the data are written in the disk when really are on the buffer of DRBD, how i should configure DRBD and the cache of the VM?

Best regards
Cesar

I suppose the best option here is writeback with barriers. So, if my VM was Linux I would create one virtual HDD with ext4 (barriers are enabled by default) and another HDD with swap partition. The first HDD can be attached with cache=writeback and the second one must be attached with directsync/writethrough. At the same time we have to be sure that all layers between VM and physical drive support barriers: VM, LVM (if used), DRBD, physical drive, something else (if used).
 
Last edited:
I suppose the best option here is writeback with barriers. So, if my VM was Linux I would create one virtual HDD with ext4 (barriers are enabled by default) and another HDD with swap partition. The first HDD can be attached with cache=writeback and the second one must be attached with directsync/writethrough. At the same time we have to be sure that all layers between VM and physical drive support barriers: VM, LVM (if used), DRBD, physical drive, something else (if used).

Many thanks giner for your answer, but i have some doubts, if you see this link:
http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=/liaat/liaatbpkvmguestcache.htm

You will see that says:

  • none With caching mode set to none, the host page cache is disabled, but the disk write cache is enabled for the guest. In this mode, the write performance in the guest is optimal because write operations bypass the host page cache and go directly to the disk write cache. If the disk write cache is battery-backed, or if the applications or storage stack in the guest transfer data properly (either through fsync operations or file system barriers), then data integrity can be ensured. However, because the host page cache is disabled, the read performance in the guest would not be as good as in the modes where the host page cache is enabled, such as writethrough mode.

Then, i understand that:
disk write cache = buffer of the RAID controller or buffer of the HDD
host page cache = buffer on Host RAM

On the other hand, i understand that:
1- We can get the conclusion that the writes go to the buffer of the HDD, and as this buffer is a lower layer to DRBD, obviously the data always will be replicated
2- And this web link tell us literally about of cache=none : "If the disk write cache is battery-backed, or if the applications or storage stack in the guest transfer data properly (either through fsync operations or file system barriers), then data integrity can be ensured.", I understand that the data will be guaranteed in the same Host thanks to the BBU.
3- For this cause, i never get a "OOS" of DRBD in my verification of DRBD volumes that is executed automatically once a week.

If i missed of anything, please give your comments about of my errors.

Best regards
Cesar
 
Cesar,
Sorry, I didn't get your point. What is the question?

Excuse me please, i believe that "cache=none" is a good configuration for the DRBD volumes, if i am wrong, please let me to know it.

Best regards
cesar

Re edited: Always based on the report of IBM in the link that i put in the previous post
 
Last edited:
1. You asked about configuration when we have a single HDD without RAID, so I suggested.
2. Any modes except directsync and writethrough shouldn't be used if we can't be sure that upper layer wouldn't try to submit new date before old data is committed. So ext4 with barriers enabled will work with any of caching modes without any problems but swap partition won't.
 
So ext4 with barriers enabled will work with any of caching modes without any problems but swap partition won't.

Having swap inconsistent would be bad for live migrations.

Cesar, there are two issues here:
1. Data safety/consistency
2. Performance

To ensure safety/consistency, when using DRBD, the only options you can use are directsync or writethrough.
I believe that performance will be best with directsync no matter what the storage is. ( single Dusk, RAID array or whatever )
However I have only performed benchmarks on RAID arrays.

If someone wants to benefIt from read cache, in my experience, the best performance will be if the guest does the cache not the host.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!