DRBD stops VM til end of write

proxtest · Apr 16, 2014

I have installed a proxmox cluster on 2 Lenovo RD340, Debian install and use MD-Raid1.
Also have a NAS Quorum connected on iscsi.

The drbd-connect is made with interated eth1 over 1 Gbit, proxmox runns on eth0.

Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal

I have 1 VM running Ubuntu 12.04 LTS with Kerio (35 User) installed inside.
Kerio write a lot on the discs and the hole VM stops working until the write on drbd is done. :-(

101.conf:
scsi0: VM_Disk_drbdvg0:vm-101-disk-1,cache=directsync,size=75G <- sytsem
scsi1: VM_Disk_drbdvg0:vm-101-disk-2,size=550G <- kerio

resource r1 {
protocol C;
startup {
wfc-timeout 0; # non-zero wfc-timeout can be dangerous (http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration)
degr-wfc-timeout 60;
become-primary-on both;
}
net {
sndbuf-size 512k;
cram-hmac-alg sha1;
shared-secret "XXXXX";
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
#data-integrity-alg crc32c; # has to be enabled only for test and disabled for production use (check man drbd.conf, section "NOTES ON DATA INTEGRITY")
}
on node1 {
device /dev/drbd0;
disk /dev/md2;
address 10.1.5.31:7788;
meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/md2;
address 10.1.5.32:7788;
meta-disk internal;
}
disk {
# no-disk-barrier and no-disk-flushes should be applied only to systems with non-volatile (battery backed) controller caches.
# Follow links for more information:
# http://www.drbd.org/users-guide-8.3/s-throughput-tuning.html#s-tune-disable-barriers
# http://www.drbd.org/users-guide/s-throughput-tuning.html#s-tune-disable-barriers
# no-disk-barrier;
# no-disk-flushes;
# no-disk-flushes;
# no-md-flushes;
#no-disk-barrier;
}

Unfortunatelyi can't change to protocol A in a primary/primary installation. So any hints to keep the system running while write on drbd?
Or can proxmox work with primary/secondary setup?

Sytem is unuseable at the moment, if get 1 x 2,5MB Email to 3 recipients it's writing 30 MB to drbd (nmon). :-(
If system stops it not write the systemstate in kerio also and user can't do anything. 11:28 til 11:31.

I can do a ps -ax or something but not get answer to df -h or mount or somithing else who needs disc access! :-(
Any idea?

Thanks

proxtest · Apr 16, 2014

Reply myself

Aditional Information: The black window is nmon on the proxmox node and shows the VM is writing with nearly 4MB/s, as long the writes takes u see on the right side the systemstate of kerio stops writing the state. :-(
Also the acces to the mailboxes stops and user geting timeouts. :-(

Verry bad situation!

Any hints?

e100 · Apr 16, 2014

meta-disk internal; causes random IO.
They metadata is stored at the end of the underlying volume.
VM writes block to begining of volume, DRBD then amplifys that by updating the meta data at the end of the volume.

You mention Md RAID1, so software RAID on a couple of disks.
Those are mechanical, not SSD disks?
If mechanical random IO is already limited.

Now imagine if you VM is also producing random IO.....

I've run DRBD in MD RAID before, it works ok but I never had great random IO performance.

So that is my first guess, your storage is too slow for the workload presented to it.

proxtest · Apr 16, 2014

e100 said:
meta-disk internal; causes random IO.
They metadata is stored at the end of the underlying volume.
VM writes block to begining of volume, DRBD then amplifys that by updating the meta data at the end of the volume.

You mention Md RAID1, so software RAID on a couple of disks.
...
So that is my first guess, your storage is too slow for the workload presented to it.

Yes but hdparm tell me it's read 130 MB/s, so the load with 35 Users should be not to high, it's only a mailserver!
But if somebody delete 20 Mails the macj´hine stops responding for 2 minutes!
I not understand why the machine not slow down - she stops completely until write is finish.
Maybe Kerio is a little stupid and write to much for a few simple mails but i can't change this little crapy thing. :-(

The problem is only when write something, i will try
no-disk-flushes;
no-md-flushes;
no-disk-barrier;

as soon i can reboot.
How big i can configure the buffersize? i have 16 Giga in every node and 8 giga inside the VM. Can use 64MByte buffer?

But there is no change proxmox can handle primary/secondary setup? I only run the mailserver in this cluster - no need aditional VM!

Thanks for comment!

proxtest · Apr 17, 2014

Ok i changed some configs and now it runs much better, still have a hight load if something happen and kerio writes unbelivable much for only a few mails but it not stops completely now!

global { usage-count no; }
common {
->al-extents NEW -> syncer { rate 95M; verify-alg md5; al-extents 3389; }
handlers { out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; }
}

resource r1 {
protocol C;
startup {
wfc-timeout 0; # non-zero wfc-timeout can be dangerous (http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration)
degr-wfc-timeout 60;
become-primary-on both;
}
net {
sndbuf-size 512k;
cram-hmac-alg sha1;
shared-secret "XXXXX";
->NEW max-buffers 8000;
->NEW max-epoch-size 8000;
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
#data-integrity-alg crc32c; # has to be enabled only for test and disabled for production use (check man drbd.conf, section "NOTES ON DATA INTEGRITY")
}
on node1 {
device /dev/drbd0;
disk /dev/md2;
address 10.1.5.31:7788;
meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/md2;
address 10.1.5.32:7788;
meta-disk internal;
}
disk {
# no-disk-barrier and no-disk-flushes should be applied only to systems with non-volatile (battery backed) controller caches.
# Follow links for more information:
# http://www.drbd.org/users-guide-8.3/s-throughput-tuning.html#s-tune-disable-barriers
# http://www.drbd.org/users-guide/s-throughput-tuning.html#s-tune-disable-barriers
# no-disk-barrier;
# no-disk-flushes;
->NEW no-disk-flushes;
->NEW no-md-flushes;
->NEW no-disk-barrier;
}
}

So i will see when have all customer online whats happen.

Lenovo RD340 is a good solution if speed is ok, fencing and ha working great!

Stay tuned....

e100 · Apr 17, 2014

proxtest said:
Yes but hdparm tell me it's read 130 MB/s

That speed is reading sequentially.
Random IO speed will be much less than that on a mechanical disk.

On my Areca RAID card I can hit read speeds of 2000MB/sec sequential but 4k random wites can be as low as 200MB/sec and even slower from within a VM (because KVM does not have threaded IO thus limiting the number of IOPS)

When performing random IO the drives speed (bandwidth) is much less important than its latency (seek speed)
This is why Solid State Disks are so much faster at random IO, they have virtually no latency and as such can perform more IO operations in the same time than a high latency mechanical disk.

A RAID card with battery backed write cache, in my experience, makes a noticeable impact on DRBD prformance.
Being able to combine the random writes from the write cache into a more sequential stream of writes to the disks really helps.
Especially if your writes do not exceed the capacity of the write cache and the cards ability to flush the cache to disk.

Email servers, in my experience, produce a supprising amount of random IO.
Log entries on email receipt, delivery into mailbox, user authenticating to check email
Just the life of a single email message is lots of IO:
Email written into the queue
Email read from queue
Email written into mailbox
Email deleted from queue
Email read from mailbox
Email moved to trash folder in mailbox
Trash folder purged in mailbox

proxtest · Apr 17, 2014

e100;93700 Especially if your writes do not exceed the capacity of the write cache and the cards ability to flush the cache to disk. ...[/QUOTE said:
Thats exactly the problem! I not see so much data writen before on mailservers.

I run a server with vserver, heartbeat and drbd for about 4 years with more than 400 mailboxes and there was never a problem like this. Postfix is a verry fast and stable mailsystem and i never see such extremly writes there.
Kerio looks different and since the last update other also complain about the speed. I don't know what this crapy thing doing but it writes a lot on disc.

I enjoy my SSD since 5 Years too and the best: she still working without any problem. (lost 2 HD's before)
But 4 x 500 Giga SSD a little bit expensive ha? And how long is useable on a heavy load system?

Now it looks not bad, the load goes up but at the moment mail is slow but useable.

Thanks!

Search

Search

DRBD stops VM til end of write

proxtest

Active Member

proxtest

Active Member

Attachments

e100

Renowned Member

proxtest

Active Member

proxtest

Active Member

e100

Renowned Member

proxtest

Active Member