Visor Dropping Transmit Packets from VZ containers.

makton

Member
Dec 7, 2009
30
0
6
I have a strange issue what may or may not be a problem. I am seeing an issue where the Visor is dropping transmit packets from the veth connections like below:

veth106.0 Link encap:Ethernet HWaddr 00:18:51:37:d6:6b
inet6 addr: fe80::218:51ff:fe37:d66b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:710240189 errors:0 dropped:0 overruns:0 frame:0
TX packets:681952673 errors:0 dropped:510612574 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:351026804692 (326.9 GiB) TX bytes:377446889575 (351.5 GiB)

veth116.0 Link encap:Ethernet HWaddr 00:18:51:f3:04:1a
inet6 addr: fe80::218:51ff:fef3:41a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:32302440 errors:0 dropped:0 overruns:0 frame:0
TX packets:32020072 errors:0 dropped:378982877 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10483546883 (9.7 GiB) TX bytes:9877701935 (9.1 GiB)

All the visors are having issues with the OpenVZ containers. the KVMs don't seem to have the same issue. Is this a normal occurance and not to worry or is this a problem needing more attention? Everything is defaulted, so it is no surprise all containers are doing the same thing.
 
pls always post what version do you run:

Code:
pveversion -v

and your hardware details.
 
pve-manager: 1.5-8 (pve-manager/1.5/4674)
running kernel: 2.6.18-2-pve
proxmox-ve-2.6.18: 1.5-5
pve-kernel-2.6.18-2-pve: 2.6.18-5
pve-kernel-2.6.18-1-pve: 2.6.18-4
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-10
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm-2.6.18: 0.9.1-5
hyp-prox00:~#

and these are HP160s. with 2 gigabit ethernets.

hyp-prox00:~# lspci
00:00.0 Host bridge: Intel Corporation 5000Z Chipset Memory Controller Hub (rev b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev b1)
00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev b1)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 4 (rev b1)
00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 5 (rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
01:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out Controller (rev 03)
01:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out Processor (rev 03)
01:04.4 USB Controller: Hewlett-Packard Company Proliant iLO2 virtual USB controller
01:04.6 IPMI SMIC interface: Hewlett-Packard Company Proliant iLO2 virtual UART
02:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
04:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
04:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
05:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
05:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E2 (rev 01)
0c:01.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)
0c:02.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet Controller (rev 04)
12:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev b4)
13:04.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev b2)
13:08.0 RAID bus controller: Hewlett-Packard Company Smart Array E200i (SAS Controller)

the specs you see are to the server with the dropped packets above. All the other visors are doing the same thing and are also the same versions. Again, this seems to be only an issue with the OpenVZ containers' ethernets as seen from the visor.
 
can you test with the 2.6.24, any difference? (apt-get install proxmox-ve-2.6.24)?
 
Considering you are asking my to try another kernel, you are not seeing the same issue.. At this point I am unable to be changing kernels as I would need to reboot the Visors and these are production. What are you expecting? I choose the 2.6.18 kernel as they are supposed to be the most stable of the kernels. Are you going away from 2.6.18?

The VMs are working but I still find this strange, and wanted to know if this is normal or not as I'm not seeing any problems with the VMs. Can this be verified?
 
Considering you are asking my to try another kernel, you are not seeing the same issue.

no

At this point I am unable to be changing kernels as I would need to reboot the Visors and these are production. What are you expecting?

a fast solution for your problem (maybe it is just a driver problem).

I choose the 2.6.18 kernel as they are supposed to be the most stable of the kernels. Are you going away from 2.6.18?

sometime in the future, yes.

The VMs are working but I still find this strange, and wanted to know if this is normal or not as I'm not seeing any problems with the VMs. Can this be verified?

This is not normal. But maybe that depends on the VMs.
 
The first and last part is what I need Dietmar. Tomorrow is our maintenance window, I'll see about getting the kernels changes and reboot the visors then. Will update you with the results. Need to update you on the setup as well, the containers are not physically on the visors they are actually on the NetApp SAN and are NFSed to the visors so I have all the VMs (VZ and KVM) on the SAN (Linkers are great). You can imagine the ethernet traffic we have on these visors, considering 2 of them are mail servers. The setup has been working great. I just noticed the packet drops while doing routine checks. They don't seem to be hindering the VMs or Visors. I'll still give this a shot.

Thanks Dietmar
 
Well, I attempted to move to the 2.6.24-11 kernel and that made a horrible day for me. I have a ton of Ethernet traffic with my two mail servers and then I upgraded to the new kernel I saw long delay times when I telnet to port25 on mail servers. With the upgrade, I had no packets dropping, but I lost a ton to performance and had serious SMTP delays which did take down the mail (not a good thing). My mail VMs are now on visors with 2.6.18 and are running fine except for the dropping packets. What is the different in the routing and VZ setup using 2.6.18 to 2.6.24? Again, massive SMTP delay with the 2.6.24 kernel with no packet drops, where no SMTP delay with 2.6.18 kernel with packets drops.

I'll live with the packet drops.

Edit: Yesterday was a really rough day due to 2.6.24 failing against the mail system, and I still need to clean the mess up. If you guys are planning to get rid of 2.6.18 then this delay issue with 2.6.24 needs to be resolved. The mail servers I run can reach upwards to 1gig of data transfer every 30 minutes (as seen by my Network Monitoring) and I have 2 of them. One server is on one visor and the other server is on another.
 
Last edited:
Here is a bit more, I have a visor which holds most of my development servers and one production server(not that important). I'll give the versions of the Visor with the packets dropping and the veth of the the mail server.

this is the Visor with dropping packets:
hyp-prox00:~# pveversion -v
pve-manager: 1.5-9 (pve-manager/1.5/4728)
running kernel: 2.6.18-2-pve
proxmox-ve-2.6.18: 1.5-5
pve-kernel-2.6.24-11-pve: 2.6.24-23 --Haven's cleaned yet here
pve-kernel-2.6.18-2-pve: 2.6.18-5
pve-kernel-2.6.18-1-pve: 2.6.18-4 -- Same with this
qemu-server: 1.1-14
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm-2.6.18: 0.9.1-5 (I see this is different than my other one)

The port from the mail server, seen by the visor..
hyp-prox00:~# ifconfig veth106.0
veth106.0 Link encap:Ethernet HWaddr 00:0e:0c:d6:5e:a4
inet6 addr: fe80::20e:cff:fed6:5ea4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15267046 errors:0 dropped:0 overruns:0 frame:0
TX packets:15852689 errors:0 dropped:12604195 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6819588131 (6.3 GiB) TX bytes:12065804612 (11.2 GiB)


Ok, here is the 2.6.24 server using a DNS server as a match:
hyp-prox02:~# pveversion -v
pve-manager: 1.5-9 (pve-manager/1.5/4728)
running kernel: 2.6.24-11-pve
proxmox-ve-2.6.24: 1.5-23
pve-kernel-2.6.24-11-pve: 2.6.24-23
pve-kernel-2.6.18-2-pve: 2.6.18-5
pve-kernel-2.6.18-1-pve: 2.6.18-4
qemu-server: 1.1-14
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.4-1

Here is the port:
veth114.0 Link encap:Ethernet HWaddr 00:18:51:ec:cb:d9
inet6 addr: fe80::218:51ff:feec:cbd9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:43806412 errors:0 dropped:0 overruns:0 frame:0
TX packets:48081354 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3402232302 (3.1 GiB) TX bytes:5711171041 (5.3 GiB)

So, you were correct in stating that going to the 2.6.24 would fix the packet drops. I started up my DEV mail server on the 2.6.24 visor and it seemed be working fine. I had fast response times and could telnet to localhost on port 25 with the VM. I did this test for about 2 hours today.

Next step was for the DEV mail server to take on load from one of the 2 mail servers, so I preped it and swapped the IPs. Good response lasted 2 to 3 minutes (about the time to get the full load) and then it started delaying with SMTP delays of more than a minute. At this point I needed to revert back to my normal mail servers with the 2.6.18 kernels.

Any idea as to whats going on here?
 
Ok here is another addition to the 2.6.18 kernel. Kinds of a rough morning as one the the Visors locked up. The Transmit drops are actually packets being sent to the VM from the Visor. Like I stated earlier, this is fixed with the 2.6.24 Kernel, but the 2.6.24 kernel fails under load. The load failure acts the same to my VM as of the visor was set to 100mb half duplex when the router is running only 100mb full duplex. I need to force the vizors to use full duplex or I get carrier faults and errors. I don't recieve these with the 2.6.24, but the reaction of the VMs is exactly the same, leading me to believe that the kernel is putting a duplex speed change on the veth ports when doesn't match that of the visor. What you think? Have you been able to reproduce this? Again, this is only seen under load.
 
What you think? Have you been able to reproduce this? Again, this is only seen under load.

No, I can't reproduce it. Maybe it's related to the NIC driver - please can you test using another network card? The list you posted contains broadcom and intel cards - which ones do you use? Maybe you can test using only one card type?
 
I'm seeing that my one vizor is different form the others. That Visor is using the following:

03:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet [14e4:164c] (rev 12) = eth0
0c:01.0 Ethernet controller [0200]: Intel Corporation 82541PI Gigabit Ethernet Controller [8086:107c] (rev 05) = eth1

The other visors are exactly the same and are using the following ethernets (course MACs are different):

03:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet [14e4:164c] (rev 12) = eth0
05:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet [14e4:164c] (rev 12) = eth1

The identical servers are 1U HP and the Ethernets are integrated, so I am unable to put anything else in them. The one HP is using both the intel and the Broadcom. I am not having issues getting in and out of the visor itself. I don't see failures on these ports with either of the 2 kernels. the failure is from the Visors to the VZ containers.

When I was using the 2.6.24 kernel The Visor (as a computer in itself) had no issues and none of the KVMs (only 3) were having any issues. This fault was isolated to the VZ containers. This with also the same for the dropping packets to the same containers when using 2.6.18.

It feels like, when using the 2.6.18 kernel we are seeing the multicast or broadcast packets being dropped for the container as the container doesn't need them (possibly for another IP). It seems that OpenVZ attempted to fixed this and adjust how the routing functions on the 2.6.24 kernel. This would require more processing and might be why I'm only seeing these issues under load as the containers having the issue are email servers and the log server, all of which have good load on them.

I'm thinking it might not be a bad idea to make KVMs and trying them with load on the 2.6.24 kernel or even the 2.6.32 kernel to see how isolated this is. Make sence?
 
This might be a DUH.. but I have to ask, I was looking up some stuff regarding a problem I was see on port 25 of one of my mail server and ran across this thread.

http://forum.proxmox.com/threads/1183-bridge-and-forwarding-routing?highlight=make+a+MAC

I decided to look at the bridges and this is the setting of one of the visors bridges setups:
hyp-prox06:/etc/vz/conf# brctl show
bridge name bridge id STP enabled interfaces
vmbr0 8000.001e0b603324 no eth0.80
vmbr1 8000.001e0b60331a no eth1
veth103.0
veth111.0
veth117.0
vmbr100 8000.001e0b603324 no eth0.100
vmbr20 8000.001e0b603324 no eth0.20
vmbr40 8000.001e0b603324 no eth0.40
vmbr60 8000.001e0b603324 no eth0.60

All the visors are the same with STP off... Should this be on, and could this be why the virtual ethernets are dropping packets?

Well, I have more. The dropped packets are packets being sent from our anit-spam solution's logging. The logging to our anti-spam is going to the logging VM server on one of the visors through vmbr1. the massive amount of packet drops are from other VC containers that are on the same bridge on any of the visors. The anti-spam server is set to only send it's log data to the IP address of the log server, but for some reason, the data is being sent to all the containers on this bridge across all the visors. I'm trying to figure out how to get this to stop and have the log data only go to the logging server. This seems to be a fault of the visors at this time. any ideas?
 
Last edited:
All the visors are the same with STP off... Should this be on, and could this be why the virtual ethernets are dropping packets?

No, I don't think you need that (but it should be relatively easy to test/verify that).
 
Sorry it's been a bit to update. I been working on the issue from the source to the VM and we found the initial issue for the dropped packets. The log server's (this is the VM) external ethernet was only receiving data and not transmitting. Thus, it's MAC was being removed from the switch's ARP tables to which all the visors and other DMZ servers are connected. because of the MAC being removed out of the ARP table, the switch would start broadcasting across all ports. To solve this, I have that ethernet doing a single ping about every 2 minutes. This is keeping the MAC address in the ARP table and directing the packets right to the VM. So now, I am not having any dropped packets on any of the visors.

This is also a possibility as to why the kernel upgrade hindered the hi load VMs and the entire switch was being spammed. I have not tried going back to the 2.6.24 kernel yet, but I have a feeling that it will be fine now. This was a real pain to figure out, but it's finally solved and the performance is much better now..

Thanks guys for helping me out, and sorry for being a pain.

Mike
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!