DRBD too Slow

vitor costa

Active Member
Oct 28, 2009
142
2
38
I have a DRBD Cluster with Intel Xeon a Perc Controler in a node and a SASR Controler another node.
The nodes is conected in a exclusive network with a giga switch.

The Sincer parameter is put in 30M.

When resync operations occurs the network i have a very high usage (10/30 mb/s), its ok and resync is fast.

But in normal operations (when one vm have intensive write disc operation), This operation is very slow and no intensive network usage apear in monitoring (below 1 mb/s).

I using default drbd configuration in proxmox wiki. I think is needed some write cache configuration in secondary node, but drbd documentation is very hard to decrypt.... I try some parameters change , but not perceive any diference.


Here my DRBD.conf:

Code:
global { usage-count no; }
common { syncer { rate 30M; } }
resource r0 {
        protocol C;
        startup {
                wfc-timeout  15;     # wfc-timeout can be dangerous (http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration)
                degr-wfc-timeout 60;
                become-primary-on both;
        }
        net {
                max-buffers 8000;
                max-epoch-size 8000;
                sndbuf-size 512k;
                
            
                cram-hmac-alg sha1;
                shared-secret "my-secret";
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
       }
       on pm-cond {
                device /dev/drbd0;
                disk /dev/sdb1;
                address 192.168.0.2:7788;
                meta-disk internal;
        }
        on pm-dp {
                device /dev/drbd0;
                disk /dev/sda1;
                address 192.168.0.3:7788;
                meta-disk internal;
        }
}
 
Last edited:
I'm also trying to learn DRBD. I have used http://www.drbd.org/users-guide/p-performance.html for tweaking performance. Take a look at the sections "Measuring throughput" and "Measuring latency". If you use LVM you probably want to make a partition separately for these tests in order not to overwrite any production data. Using these tests you can verify if adding some options helps in your setup. You can try some of the following parameters:
syncer {
al-extents 3389;
}
disk {
no-disk-flushes;
no-md-flushes;
no-disk-barrier;
}
net {
sndbuf-size 0;
no-tcp-cork;
unplug-watermark 16;
}

However many of the parameters depend on your RAID controller, but it should be quite easy to find them in the manual and test what works best for you. I guess you have BBU on the RAID controllers on both servers?
 
I'm also trying to learn DRBD. I have used http://www.drbd.org/users-guide/p-performance.html for tweaking performance. Take a look at the sections "Measuring throughput" and "Measuring latency". If you use LVM you probably want to make a partition separately for these tests in order not to overwrite any production data. Using these tests you can verify if adding some options helps in your setup. You can try some of the following parameters:
syncer {
al-extents 3389;
}
disk {
no-disk-flushes;
no-md-flushes;
no-disk-barrier;
}
net {
sndbuf-size 0;
no-tcp-cork;
unplug-watermark 16;
}

However many of the parameters depend on your RAID controller, but it should be quite easy to find them in the manual and test what works best for you. I guess you have BBU on the RAID controllers on both servers?
Hi,
thanks for the info!
So i improved the speed from 75MB/s up to 98MB/s. Only the latency don't change...

Udo
 
The session "Measuring throughput" and "Measuring latency" show nice script to test. But how i do this tests in a runing and production Proxmox Cluster ?

Udo,

If you maded this tests , please post your scripts and procedures here
 
The session "Measuring throughput" and "Measuring latency" show nice script to test. But how i do this tests in a runing and production Proxmox Cluster ?

Udo,

If you maded this tests , please post your scripts and procedures here
Hi,
i made the test on a running cluster but with disabled drbd-volumegroup (i copied before all drbd-lv to local storage).

Udo
 
Hi,
thanks for the info!
So i improved the speed from 75MB/s up to 98MB/s. Only the latency don't change...

Udo
Hi,
i have now change the 1GB-Intel NIC with an 10GBit Solarflare-NIC and improve the speed to normal raid-speed.
The NICs are directly connected.
Code:
# ./drbd_speed_test.sh
1+0 Datensätze ein
1+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 2,66578 s, 201 MB/s
1+0 Datensätze ein
1+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 1,78301 s, 301 MB/s
1+0 Datensätze ein                                                                                                                                     
1+0 Datensätze aus                                                                                                                                     
536870912 Bytes (537 MB) kopiert, 2,02004 s, 266 MB/s                                                                                                  
1+0 Datensätze ein                                                                                                                                     
1+0 Datensätze aus                                                                                                                                     
536870912 Bytes (537 MB) kopiert, 1,94908 s, 275 MB/s                                                                                                  
1+0 Datensätze ein                                                                                                                                     
1+0 Datensätze aus                                                                                                                                     
536870912 Bytes (537 MB) kopiert, 2,16976 s, 247 MB/s   
                                                                                               
# from now without drbd-sync
1+0 Datensätze ein
1+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 1,9796 s, 271 MB/s
1+0 Datensätze ein
1+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 2,46363 s, 218 MB/s
1+0 Datensätze ein
1+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 1,61586 s, 332 MB/s
1+0 Datensätze ein
1+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 1,97908 s, 271 MB/s
1+0 Datensätze ein
1+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 2,00115 s, 268 MB/s
pveperf looks also good, but the fsyncs are bad:
Code:
pveperf /mnt
CPU BOGOMIPS:      29474.61
REGEX/SECOND:      1190395
HD SIZE:           29.53 GB (/dev/mapper/drbd0vg-test)
BUFFERED READS:    357.82 MB/sec
AVERAGE SEEK TIME: 4.00 ms
FSYNCS/SECOND:     92.56
DNS EXT:           96.91 ms
DNS INT:           0.51 ms
The latency don't change (this shows also the fsyncs i guess):
Code:
./drbd_latenz.sh 
1000+0 Datensätze ein
1000+0 Datensätze aus
512000 Bytes (512 kB) kopiert, 0,156105 s, 3,3 MB/s
1000+0 Datensätze ein
1000+0 Datensätze aus
512000 Bytes (512 kB) kopiert, 0,0210024 s, 24,4 MB/s
If someone have a hint to improve the latenz, let me know.

The config:
Code:
global { usage-count no; }
common { syncer { al-extents 3389; rate 150M; } }
resource r0 {
        protocol C;
        startup {
                wfc-timeout  15;
                degr-wfc-timeout 60;
                become-primary-on both;
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "xxxlITkIvWBjUPf0xxmpSpWpxxx";
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
                sndbuf-size 0;
                no-tcp-cork;
                unplug-watermark 16;
                max-buffers 8000;
                max-epoch-size 8000;

        }
        disk {
                no-disk-flushes;
                no-md-flushes;
                no-disk-barrier;
        }
        on proxmox1 {
                device /dev/drbd0;
                disk /dev/sdc1;
                address 172.20.2.11:7788;
                meta-disk internal;
        }
        on proxmox2 {
                device /dev/drbd0;
                disk /dev/sdc1;
                address 172.20.2.12:7788;
                meta-disk internal;
        }
}
Udo
 
Last edited:
Latency is around the same I get with 10GbE. However I'm able to get around 900 fsyncs/s.
Have you tried setting your sync rate lower? Don't know if it changes anything, but the manual recommends 1/3 of bandwidth of the bottleneck of the system which I guess is the disk system.
How is the pveperf if you disconnect drbd?
 
Latency is around the same I get with 10GbE. However I'm able to get around 900 fsyncs/s.
Have you tried setting your sync rate lower? Don't know if it changes anything, but the manual recommends 1/3 of bandwidth of the bottleneck of the system which I guess is the disk system.
How is the pveperf if you disconnect drbd?
Hi,
i assume not that the sync rate will change anything but i will try that (must move before some VMs from the storage).
The fsyncs change dramaticly when disconnect the drbd:
Code:
drbd connected:
CPU BOGOMIPS:      29474.69
REGEX/SECOND:      1178651
HD SIZE:           246.08 GB (/dev/mapper/drbd0vg-test)
BUFFERED READS:    232.56 MB/sec
AVERAGE SEEK TIME: 6.38 ms
FSYNCS/SECOND:     98.67
DNS EXT:           71.98 ms
DNS INT:           0.53 ms

drbd disconnected:
pveperf /mnt
CPU BOGOMIPS:      29474.69
REGEX/SECOND:      1146623
HD SIZE:           246.08 GB (/dev/mapper/drbd0vg-test)
BUFFERED READS:    334.23 MB/sec
AVERAGE SEEK TIME: 5.17 ms
FSYNCS/SECOND:     4732.99
DNS EXT:           150.90 ms
DNS INT:           0.51 ms

Udo
 
Hi,
thanks for the info!
So i improved the speed from 75MB/s up to 98MB/s. Only the latency don't change...

Udo

Hi Udo,

Let me to make three questions about of tuning for DRBD for network latency:
1- About of your tunning (shown above), how much consumption "RAM" represents?
2- For DRDB with NICs of 10 Gb/s, Is it recommended the same configuration?
3- If not is the correct setup for NICs of 10 Gb/s, Could you show me how I set it up?

I will be very grateful if you can dispel my doubts

Best regards
Cesar
 
Last edited:
Hi Udo,

Let me to make three questions about of tuning for DRBD for network latency:
1- About of your tunning (shown above), how much consumption "RAM" represents?
Hi cesarpk,
don't know - drbd is an kernel module and the mem-consumption are not sisible with top. With lsmod I can see the size of the module, but not the ram-usage due the work...
But I have not the feeling that DRBD takes much RAM.
2- For DRDB with NICs of 10 Gb/s, Is it recommended the same configuration?
The config above is a 10GB-config (with solarflare 10GB-NICs).

DRBD run very stable with the 10GB-Ethernet. With dolphin-NICs I had sometimes issues (driver related) but with 10GB-Ethernet no one during over one year.
For latency reasons is infiniband very intresting ( I have done some test but not in production - but e100 has good experiences with infiniband).

Udo
 
Hi cesarpk,
don't know - drbd is an kernel module and the mem-consumption are not sisible with top. With lsmod I can see the size of the module, but not the ram-usage due the work...
But I have not the feeling that DRBD takes much RAM.

The config above is a 10GB-config (with solarflare 10GB-NICs).

DRBD run very stable with the 10GB-Ethernet. With dolphin-NICs I had sometimes issues (driver related) but with 10GB-Ethernet no one during over one year.
For latency reasons is infiniband very intresting ( I have done some test but not in production - but e100 has good experiences with infiniband).

Udo

Thank you very much Udo, you and e100 are my super teachers :p, and yours words is very important for me

What about the dolphin brand , in this web site literally says that no problems have been reported (maybe it's just a marketing strategy or is another board model ???):
http://ww.dolphinics.no/download/IX_4_1_X_LINUX_DOC/apds01.html

Last question (a question peculiar): Why generally you and e100 are answering most of the questions in this forum? ... I particularly think that this forum would not be the same without you, and thanks to you, this forum is quite active. in any case ... thanks for your patience to teach.

Best regards to my Masters
Cesar
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!