Mtu 9000

bsilva

Guest
Hi all,

I have been testing proxmox (1.7, build 5323) the last couple of weeks and so far I am very happy with it.
I am considering using it on production on a couple of systems I manage (to replace Centos-Xens: having problems setting up some oracle guests with HugePages...).

I am just writing to share a small problem I had with this test environment, and how I solved it.

I have two network cards on this system, one that I use for regular traffic and the other for iSCSI traffic. As I believe it would help, I went on to try setting MTU 9000 on that storage network.

Having found this post, I changed my interfaces file accordingly. But as soon as I would bring up a guest connected to that bridge, even after configured the guest network card (inside the guest) with MTU 9000, the bridge would fall back to MTU 1500, and bigger packets would fail to pass.

Then I noticed the tap<guestid>xxxx interface on the host was being set up with the wrong MTU.

So I changed my /var/lib/qemu-server/bridge-vlan to something that would force the MTU on the new interface to the MTU of the bridge it will be connected to.

It seems to be working fine.
There must be a better way of coding this, and it could probably use some validations if it cant get the MTU, but I will leave that to some more skilled folks.

changes:
Code:
#!/usr/bin/perl -w

use strict;

my $iface = shift;

die "no interface specified\n" if !$iface;

die "got strange interface name '$iface'\n"
    if $iface !~ m/^tap(\d+)i(\d+)(d\d+)?$/;

my $vmid = $1;
my $vlan = $2;

my $bridge = "vmbr$vlan";
[B]my $bridgeMTU = `/sbin/ifconfig $bridge | /usr/bin/tr " " "\n" | /bin/grep MTU | /bin/sed "s/MTU://"`;
[/B]
system ("/sbin/ifconfig $iface 0.0.0.0 promisc up [B]mtu $bridgeMTU[/B]") == 0 ||
    die "interface activation failed\n";

system ("/usr/sbin/brctl addif $bridge $iface") == 0 ||
    die "can't add interface to bridge\n";

exit 0;

Best regards,

Bruno Silva
 

Lokytech

Active Member
Sep 29, 2010
44
8
28
France
up.

I'm really interested by this but not an expert of MTU.

I know that proxmox communicate with SAN/NAS with the MTU of vmbr0 interface. So to speed up if you have Gb ethernet we can set the MTU to > 1500.
But, in my case, my VMs do not communicate with SAN/NAS but with my router and then to the internet. So if i change the mtu of the VM bridge interface, and not inside the VM, proxmox will have to assemble packets to have the mtu of the bridge interface. And if i put the mtu to 9000 inside the vm, my router will have to reassemble them. So i will move the charge. Here is my dilema if i understand everything correctly. If everything is correct, i can change the MTU everywhere and put the work to the router.

Do you agree ?
 

bsilva

Guest
Hi,

If you leave the VM nic mtu=1500, all packets the VM generates will never be bigger than that, so all the fragmentation work will be done in the VM (outbound traffic). Also, it depends on the traffic type and firewall settings along the way. It you let ICMP through, TCP will be able to do Path MTU discovery, and your VM may be already sending packets smaller that its "local" MTU. For instance, If you are using a VPN, or our ISP modem uses PPoE, you may already have smaller packet sizes to some destinations (outside your local link).

It will not be a problem to leave the VM interface at 1500 and set the bridge at 9000; That particular VM will not take advantage of the bigger MTU, but other VMs may, and the host may as well (talking to a SAN or NAS for instance).
Just like having a switch that allow "jumbo frames" (and you will need one that do for all of this to work) does not mean that automatically all the machines connected to it will send bigger packets.

So if you leave your internet router internal interface ("default gateway") also at 1500, it will work as before, even it you increase the host interface and the bridge MTU to 9000.

Bridges do not deal with fragmentation (it's a like a layer 2 switch), that is a work for routers or end-points (layer 3). Setting the MTU on the bridge will just set the maximum size of the packets it will allow trough. If you try to send a bigger than MTU packet via a bridge, it will be silently dropped, and never get to destination.
Proxmox (your host) will not have to do any work here, unless your VM is on a different subnet than your internet router, and your host is that VM "default gateway" (working as a router as well).

But you will have communication problems if you leave that VM mtu at 1500 and for instance try to use your SAN or NAS from the VM (web management interface, or SMB/NFS): if you already set the SAN/NAS mtu=9000, it will reply with bigger packets, and your VM will not process them (Path MTU Discovery will not work here, as there is no routing involved)(all this assuming you are still on IPV4, IPV6 works differently).

The same will happen if you try to use the web management interface of proxmox from the VM. (having the VM MTU at 1500 and the host interface at 9000). Or if you try to access the internet from the host (apt-get ?), with the default gateway at 1500 and your host interface at 9000.

Basically, you will have to set the same MTU on all the interfaces connected to your "local network" if you want them all to talk to each other. This includes the internal interface of your internet router ("default gateway")!.

My suggestion is that you should have a second "network" for that bigger MTU (new switch, or use VLANs: if your switch supports vlan tagging, you may even get away with only one interface on the host).

Be aware that bigger packets do not automatically mean better performance. Some applications may actually be worse (like VOIP, for instance). You should do performance tests, with several packet sizes, every time you change a setting.

I recommend iperf, atop and iftop (iptraf is also nice).
 

Lokytech

Active Member
Sep 29, 2010
44
8
28
France
I have to say : Thank you very much.

Just one last question to be sure i have understood :
For example, i have a CISCO ASA 5510 as router. If i set the mtu of the internal interface if my internet router to 9000, the packet's size is still 1500 because all inbound(internet) packet are 1500, isn't it ?
And for the outbound packet, the router will resize all packet before sending them over the internet ?
 

bsilva

Guest
Yes, with most ISPs the MTU of your link to them will usually be 1500 or smaller (PPoE modem in the middle, for instance). And all your outbound traffic will be done with with packets up to MTU of the external interface: the router will fragment bigger packets if they are routed from the internal traffic.

Again, not all traffic will benefit of larger MTUs. For instance, see the stats of this particular machine I manage. This machine is a xen server, with 6 VMs, and eth1 is only used by the host to access the SAN (iSCSI).

As you can see, even for this kind of traffic (storage), a lot of the packets are below 127 bytes. (note: not all network cards/drivers will give you this kind of detailed stats.).


Code:
[root@realxen1 ~]# uptime 
 21:52:48 up 94 days,  5:41,  1 user,  load average: 0.07, 0.08, 0.08
[root@realxen1 ~]# netstat -i
Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0       1500   0 23308996      0      0      0   168359      0      0      0 BMRU
eth1       9000   0 362008805      0      0      0 389903923      0      0      0 BMRU
lo        16436   0    27732      0      0      0    27732      0      0      0 LRU
peth0      1500   0 454112821      0      0      0 476284608      0      0      0 BORU
realftp1   1500   0 11113070      0      0      0 33013258      0    221      0 BORU
imgsync1   1500   0  3183080      0      0      0 26205587      0      0      0 BORU
imgsync2   1500   0  4416174      0      0      0 28009790      0      0      0 BORU
realts3    1500   0 53032039      0      0      0 80324361      0      0      0 BORU
realts4    1500   0 44264065      0      0      0 60921426      0      0      0 BORU
realweb1   1500   0  8610971      0      0      0 30444803      0      0      0 BORU
tap0       1500   0        0      0      0      0 21773274      0      0      0 BMRU
tap1       1500   0        0      0      0      0 23124165      0      0      0 BMRU
tap2       1500   0        0      0      0      0 23124117      0      0      0 BMRU
vif0.0     1500   0   168360      0      0      0 23308996      0      0      0 BORU
xenbr0     1500   0 23111148      0      0      0        0      0      0      0 BORU
[root@realxen1 ~]# ethtool -S eth1
NIC statistics:
     rx_bytes: 195084142533
     rx_error_bytes: 0
     tx_bytes: 1000739229308
     tx_error_bytes: 0
     rx_ucast_packets: 361999071
     rx_mcast_packets: 7202
     rx_bcast_packets: 2811
     tx_ucast_packets: 389904165
     tx_mcast_packets: 6
     tx_bcast_packets: 5
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 10027
     rx_65_to_127_byte_packets: [B]322.142.860[/B]
     rx_128_to_255_byte_packets: 431889
     rx_256_to_511_byte_packets: 206202
     rx_512_to_1023_byte_packets: 267448
     rx_1024_to_1522_byte_packets: 97372
     rx_1523_to_9022_byte_packets: [B]388.53.286[/B]
     tx_64_byte_packets: 57
     tx_65_to_127_byte_packets: [B]231.427.715[/B]
     tx_128_to_255_byte_packets: 12535
     tx_256_to_511_byte_packets: 49641
     tx_512_to_1023_byte_packets: 4788366
     tx_1024_to_1522_byte_packets: 766123
     tx_1523_to_9022_byte_packets: [B]152.859.739[/B]
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 0
     tx_xoff_frames: 0
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 15931
     rx_ftq_discards: 0
     rx_discards: 0
     rx_fw_discards: 0
[root@realxen1 ~]# ethtool -i eth1
driver: bnx2
version: 2.0.2
firmware-version: 1.9.6
bus-info: 0000:05:00.0
[root@realxen1 ~]# lspci | grep 05:00.0
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
[root@realxen1 ~]#

I added the dots and the bold for clarity.

You may actually get bigger benefits from TSO (TCP segmentation offload): this way you will make the network card work it out ("splice the packets") and free the CPU from doing it.

for instance, these broadcom cards I am using support it, but it is not enabled by default, so I add a "ethtool -K eth1 tso on" to the end of my /etc/rc.local file.

you can check your current settings with ethtool:

Code:
[root@realxen1 ~]# ethtool -k eth1
Offload parameters for eth1:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
generic-receive-offload: on
 
Last edited by a moderator:

bsilva

Guest
Hi all,

where I am again, with jumbo frames stuff..

This time, I tried out using openvz containers. Same issue, a bridge will go back to mtu=1500 if a container uses it.
I am using the latest free version at this time ( proxmox-ve-2.6.32: 3.1-111 )
the following patch will fix it. Enjoy!

Code:
diff -uprN ori/usr/lib/vzctl/scripts/vps-functions patched/usr/lib/vzctl/scripts/vps-functions
--- ori/usr/lib/vzctl/scripts/vps-functions    2013-09-27 14:11:08.173706048 +0100
+++ patched/usr/lib/vzctl/scripts/vps-functions    2013-09-27 14:11:11.585705901 +0100
@@ -172,6 +172,8 @@ vzadjustmacs()
 vzconfbridge()
 {
     if [ "x$BRIDGE" != "x" ]; then
+        BRIDGEMTU=$(cat /sys/class/net/$BRIDGE/mtu)
+        ifconfig $HNAME mtu $BRIDGEMTU
         brctl addif $BRIDGE $HNAME >/dev/null 2>&1
     fi
 }
diff -uprN ori/usr/sbin/vznetaddbr patched/usr/sbin/vznetaddbr
--- ori/usr/sbin/vznetaddbr    2013-09-27 14:11:08.173706048 +0100
+++ patched/usr/sbin/vznetaddbr    2013-09-27 14:11:11.585705901 +0100
@@ -34,6 +34,8 @@ for iface in $NETIFLIST; do
     ip addr add 0.0.0.0/0 dev "$host_ifname"
     echo 1 >"/proc/sys/net/ipv4/conf/$host_ifname/proxy_arp"
     echo 1 >"/proc/sys/net/ipv4/conf/$host_ifname/forwarding"
+    bridgemtu=$(cat /sys/class/net/$bridge/mtu)
+    ifconfig $host_ifname mtu $bridgemtu
     brctl addif "$bridge" "$host_ifname"
 
     break
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!