Having swap inconsistent would be bad for live migrations.
It is possible to use different modes for different virtual drives so we can move swap to another drive.
Could you share your benchmark results?
Having swap inconsistent would be bad for live migrations.
directsync does NOT prevent this problem.
I've recently noticed a larger number of inconsistencies, on upgraded faster hardware, so I enabled data-integrity-alg on a couple of nodes.
Few hours later DRBD breaks and split-brains due to the error:
buffer modified by upper layers during write
This is the precise problem that giner described in the first post.
All of my VMs have directsync set for all of their VM disks, so it is obvious that directsync does NOT prevent the buffer from being modified while DRBD writes are in flight.
writethrough, in my experience, does not perform well so I am not sure what I will do to prevent this problem.
I wonder if DRBD 8.4 in the 3.10 kernel is less susceptible to this problem.
For reference the cause of this problem is explained by Lars, one of the main DRBD developers, here:
http://lists.linbit.com/pipermail/drbd-user/2014-February/020607.html
Hi to all
Long time ago i was talking with Lars, and he tell me that "data-integrit-alg" only should use for purposes of test and never in a production enviroment due that typically exist modifications of the network packets (upper layers in my case refers to the Hardware into the same PC due to that i use DRBD in mode NIC-to-NIC), and since that i erased such directive (data-integrit-alg), my problems are over.
Moreover, always is better this practices:
1) DRDB in connections NIC-to-NIC (i use much bonding "balance-rr" and "jumbo-frames" for duplicate the speed of connection)
2) Don't use a password for the replication (in a connection NIC-to-NIC nobody can see the transmition), and also you get more speed for the replication and less use of processor.
3) LVM on top of DRBD is the best way of get speed of access disk.
4) PVE host should have as I/O Scheduler "deadline", while a Linux guest should have as I/O Scheduler "noop" (no optimized)
5- Virtio-block as disk driver in the guest (maybe no all know).
6- In my particular case, i use DRBD 8.4.4 version since many time ago, and soon i will have 8.4.5 in other PVE servers, and i never had problems considering that verifications automatically of the DRBD storages are executed a time by week (i believe that the latest versions of DRBD are better - less bugs y better optimizations), and i always hear to Lars say it to many people this same.
Best regards
Cesar
# /etc/drbd.conf
common {
net {
protocol C;
sndbuf-size 512k;
max-buffers 128k;
max-epoch-size 8000;
}
disk {
disk-barrier no;
disk-flushes no;
md-flushes no;
resync-rate 500M;
al-extents 3389;
}
}
# resource r0 on fbo-vm-02: not ignored, not stacked
# defined at /etc/drbd.d/r0.res:1
resource r0 {
on fbo-vm-01 {
device /dev/drbd0 minor 0;
disk /dev/sdb1;
meta-disk internal;
address ipv4 10.0.100.1:7788;
}
on fbo-vm-02 {
device /dev/drbd0 minor 0;
disk /dev/sdb1;
meta-disk internal;
address ipv4 10.0.100.2:7788;
}
options {
on-no-data-accessible suspend-io;
}
net {
protocol C;
max-buffers 8000;
max-epoch-size 8000;
unplug-watermark 16;
sndbuf-size 0;
cram-hmac-alg sha1;
verify-alg md5;
shared-secret my-secret;
allow-two-primaries yes;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
disk {
disk-flushes no;
disk-barrier no;
}
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
become-primary-on both;
}
}
# resource r1 on fbo-vm-02: not ignored, not stacked
# defined at /etc/drbd.d/r1.res:1
resource r1 {
on fbo-vm-01 {
device /dev/drbd1 minor 1;
disk /dev/sdc1;
meta-disk internal;
address ipv4 10.0.101.1:7788;
}
on fbo-vm-02 {
device /dev/drbd1 minor 1;
disk /dev/sdc1;
meta-disk internal;
address ipv4 10.0.101.2:7788;
}
options {
on-no-data-accessible suspend-io;
}
net {
protocol C;
max-buffers 8000;
max-epoch-size 8000;
unplug-watermark 16;
sndbuf-size 0;
cram-hmac-alg sha1;
verify-alg md5;
shared-secret my-secret;
allow-two-primaries yes;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
disk {
disk-flushes no;
disk-barrier no;
}
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
become-primary-on both;
}
}
2) Don't use a password for the replication (in a connection NIC-to-NIC nobody can see the transmition), and also you get more speed for the replication and less use of processor.
The authentication happens only once during the initial connection not on every single request, it does not impact performance.
Turning it off will not make DRBD faster but having it enabled makes your data a little safer.
I can see maybe not using it if you are NIC-to-NIC, but if you are using a switch then authentication should be enabled.
You need to specify the HMAC algorithm to enable peer authentication at all. You are strongly encouraged to use peer authentication.
...
resync-rate 500M;
...
@macday:
AH ..... i forgot tell you that with 8.4.x versions of DRBD, these systems have some options for that in dynamic form change the speed of replication and synchronization in the fly, the target is obtain more speed of replication when is necessary, and obtain more speed of synchronization when the speed of replication is not necessary. A requisites for get such target is configure the max and min value of the synchronization system.
thanks to all...do you have an ideal config file structure and content (drbd common and ressources)?
I´m using Kernel 3.10.5-pve with the DRDB-Tools and Modules 8.4.3..thx
global {
usage-count no;
}
common {
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
}
startup {
wfc-timeout 30; degr-wfc-timeout 20; outdated-wfc-timeout 15;
}
options {
cpu-mask 0;
}
disk {
}
net {
sndbuf-size 0; unplug-watermark 16; max-buffers 8000; max-epoch-size 8000; verify-alg sha1;
}
}
resource r1 {
protocol C;
startup {
become-primary-on both;
}
disk {
on-io-error detach; al-extents 1801; resync-rate 25M; c-plan-ahead 20; c-min-rate 25M; c-max-rate 60M; c-fill-target 128k;
disk-flushes; md-flushes; disk-barrier no;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
volume 11 {
device /dev/drbd11;
disk /dev/sdb1;
meta-disk internal;
}
on pve5 {
address 10.1.1.50:7788;
}
on pve6 {
address 10.1.1.51:7788;
}
}