Hello everyone
This post is Urgent, my servers are in production
I am in a serious problem and need help
My scenario:
- I have two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE 2.3, DRBD 8.3.13, LVM on top of DRBD
- 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond round robin only for use with DRBD
And after awhile it shows me this:
shell#cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51
0: cs:StandAlone ro
rimary/Unknown ds:UpToDate/DUnknown r-----
ns:237256 nr:307093 dw:307093 dr:690264 al:0 bm:321 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
1: cs:Connected ro
rimary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:467984 dw:467984 dr:537932 al:0 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
This is my configuration:
File global_common.conf:
global { usage-count no;
}
common {
protocol C;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
}
startup {
}
disk { on-io-error detach;
}
net { sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers 8000; max-epoch-size 8000;
data-integrity-alg sha1;
}
syncer { rate 75M; al-extents 3389; cpu-mask 0; verify-alg "sha1";
}
}
File r0.res:
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
become-primary-on both;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on kvm5 {
device /dev/drbd0;
disk /dev/sda3;
address 10.2.2.50:7788;
meta-disk internal;
}
on kvm6 {
device /dev/drbd0;
disk /dev/sda3;
address 10.2.2.51:7788;
meta-disk internal;
}
}
File r1.res:
resource r1 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
become-primary-on both;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
on kvm5 {
device /dev/drbd1;
disk /dev/sdb3;
address 10.2.2.50:7789;
meta-disk internal;
}
on kvm6 {
device /dev/drbd1;
disk /dev/sdb3;
address 10.2.2.51:7789;
meta-disk internal;
}
}
Note:
I use on the directive net "data-integrity-alg sha1"; because for me is very important the data
These are my Logs:
Log in Node A:
Jun 14 08:07:28 kvm5 kernel: dlm: connecting to 4
Jun 14 08:50:12 kvm5 kernel: block drbd0: Digest mismatch, buffer modified by upper layers during write: 21158352s +4096
Jun 14 08:50:12 kvm5 kernel: block drbd0: sock was reset by peer
Jun 14 08:50:12 kvm5 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Jun 14 08:50:12 kvm5 kernel: block drbd0: short read expecting header on sock: r=-104
Jun 14 08:50:12 kvm5 kernel: block drbd0: meta connection shut down by peer.
Jun 14 08:50:12 kvm5 kernel: block drbd0: new current UUID 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
Jun 14 08:50:12 kvm5 kernel: block drbd0: asender terminated
Jun 14 08:50:12 kvm5 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:12 kvm5 kernel: block drbd0: Connection closed
Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver terminated
Jun 14 08:50:12 kvm5 kernel: block drbd0: Restarting receiver thread
Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver (re)started
Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( Unconnected -> WFConnection )
Jun 14 08:50:13 kvm5 kernel: block drbd0: Handshake successful: Agreed network protocol version 96
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFConnection -> WFReportParams )
Jun 14 08:50:13 kvm5 kernel: block drbd0: Starting asender thread (from drbd0_receiver [1847])
Jun 14 08:50:13 kvm5 kernel: block drbd0: data-integrity-alg: sha1
Jun 14 08:50:13 kvm5 kernel: block drbd0: drbd_sync_handshake:
Jun 14 08:50:13 kvm5 kernel: block drbd0: self 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99 flags:0
Jun 14 08:50:13 kvm5 kernel: block drbd0: peer CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0 flags:0
Jun 14 08:50:13 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFReportParams -> Disconnecting )
Jun 14 08:50:13 kvm5 kernel: block drbd0: error receiving ReportState, l: 4!
Jun 14 08:50:13 kvm5 kernel: block drbd0: asender terminated
Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:13 kvm5 kernel: block drbd0: Connection closed
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Jun 14 08:50:13 kvm5 kernel: block drbd0: receiver terminated
Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating receiver thread
Log in node B:
Jun 14 08:07:28 kvm6 kernel: dlm: Using TCP for communications
Jun 14 08:07:28 kvm6 kernel: dlm: got connection from 3
Jun 14 08:50:12 kvm6 kernel: block drbd0: Digest integrity check FAILED: 21158352s +4096
Jun 14 08:50:12 kvm6 kernel: block drbd0: error receiving Data, l: 4140!
Jun 14 08:50:12 kvm6 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
Jun 14 08:50:12 kvm6 kernel: block drbd0: new current UUID CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
Jun 14 08:50:12 kvm6 kernel: block drbd0: asender terminated
Jun 14 08:50:12 kvm6 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:12 kvm6 kernel: block drbd0: Connection closed
Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( ProtocolError -> Unconnected )
Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver terminated
Jun 14 08:50:12 kvm6 kernel: block drbd0: Restarting receiver thread
Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver (re)started
Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( Unconnected -> WFConnection )
Jun 14 08:50:13 kvm6 kernel: block drbd0: Handshake successful: Agreed network protocol version 96
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFConnection -> WFReportParams )
Jun 14 08:50:13 kvm6 kernel: block drbd0: Starting asender thread (from drbd0_receiver [1857])
Jun 14 08:50:13 kvm6 kernel: block drbd0: data-integrity-alg: sha1
Jun 14 08:50:13 kvm6 kernel: block drbd0: drbd_sync_handshake:
Jun 14 08:50:13 kvm6 kernel: block drbd0: self CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0 flags:0
Jun 14 08:50:13 kvm6 kernel: block drbd0: peer 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99 flags:0
Jun 14 08:50:13 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm6 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Jun 14 08:50:13 kvm6 kernel: block drbd0: meta connection shut down by peer.
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFReportParams -> NetworkFailure )
Jun 14 08:50:13 kvm6 kernel: block drbd0: asender terminated
Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( NetworkFailure -> Disconnecting )
Jun 14 08:50:13 kvm6 kernel: block drbd0: error receiving ReportState, l: 4!
Jun 14 08:50:13 kvm6 kernel: block drbd0: Connection closed
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Jun 14 08:50:13 kvm6 kernel: block drbd0: receiver terminated
Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating receiver thread
I will be extremely grateful to anyone who can help me
Best regards
Cesar