multicast problem after upgrade from 1.8 to 1.9

lozair · Feb 27, 2012

Hi,
We have Oracle Application Server on our KVM proxmox vm.
This VM is connected to a bridge on the proxmox host (interface bonding + vlan tagging).
Our vm should receive multicast packets in order to work with a central database server.

All worked fine with proxmox 1.8 but since migration to 1.9 vm does not receive multicast packets anymore.

Looking traffic with tcpdump, we can see the following :

The multicat packet is sent by a server (not a proxmox vm, it's a physical server on the same LAN than the proxmox VMs)
The multicast packet is provided by ours physicals switchs to the proxmox bridge (we can capture it on the bridge on the proxmox host)
The multicast packet is never forwarded to the vm network interface (we can't capture it on the tap guest interface)

I googling a lot but nothing resolve my problem....

Here is my bridge config :
bridge name bridge id STP enabled interfaces
vmbr0 8000.a4badb0d8627 no eth0
vmbr3 8000.a4badb0d8629 no bond0.3 tap212i3d0

The tap212i3d0 is the guest interface.

Any advice/help would be great

Thks for your help

dietmar · Feb 28, 2012

Does it help if you set

net.ipv4.conf.all.rp_filter = 2

in /etc/sysctl.d/vzctl.conf (please reboot the system top activate those settings).

lozair · Feb 28, 2012

Same problem.
We have rebooted the systeme but nothing on the guest interface.....

We have disabled iptables, ebtables is not installed.
We have tunned bridge using this parameters following bridge howto :

# Test bridge/multicast
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

Some strange behaviour when i do a sysctl -a to list kernel parameters, the following message appear :
error: permission denied on key 'vm.compact_memory'
error: permission denied on key 'net.ipv4.route.flush'
error: permission denied on key 'net.ipv4.route.flush'

We have tested on old Xen host (it use bridge like in proxmox), if i make a guest on the same LAN than the multicast sender, i get the multicast message at the guest level.....

lozair · Feb 28, 2012

More information about my problem.

In order to exclude a problem in the guest, we have changed the network card driver from virtio to rtlxxx but no change.

We make some test and a strange behaviour appear.
Consider the following :
SENDER : is the multicast packet sender
CLIENT : is the multicast client (the guest proxmox vm machine)

On SENDER, we send packets with netcat on port 14021 : nc -u -p 14021 228.5.6.7 14021
All is fine, we receive the packet on the CLIENT, so the bridge forward it.
We can repeat this test for few minutes, all is fine

On SENDER, we send packets with the oracle rwdiag.
The packet caracteristic is the same than the netcat packet. The data in the packet is different.
At the bridge level, we receive the packet.
At the guest level, all is bad, we don't receive any packet........

On SENDER we send again packets with netcat on port 14021 : nc -u -p 14021 228.5.6.7 14021
At the bridge level, we receive the packet.
At the guest level, all is bad, we don't receive any packet........

We wait ~5 minutes

On SENDER we send again packets with netcat on port 14021 : nc -u -p 14021 228.5.6.7 14021
At the bridge level, we receive the packet.
At the guest level, all is ok !!!!

If we send again with the oracle rwdiag client the bridge is broken again for 5 minutes.........

Another info is the bridge do not forward the multicast packet 228.5.6.7 for 5 minutes but in the same time i can send any packets with another multicast address and it's forwarded to the bridge...
Example :
nc -u -p 14021 228.5.6.7 14021 : packet don't reach the guest
nc -u -p 14021 228.5.6.6 14021 : packet reach the guest

It seems the bridge "block/can't" respond to this multicast packet.
I'm looking what this 5 minutes (delay/timeout) could coresspond but for now no result

thanks for your help

lozair · Feb 29, 2012

In order to resolve the problem we have disabled bonding at the proxmox host level.
We have disabled Multi Link Trunk at the switch level.
always the same problem...

lozair · Feb 29, 2012

Perhaps a problem at the kernel level.....
we run the 2.6.32-6-pve kernel.
Can anyone confirm that i can revert to 2.6.32-4-pve kernel just rebooting the server and don't modifying other pve packages ?
I'm running only kvm guests.

Thks

lozair · Feb 29, 2012

We have "identified" the problem.

All work fine with 2.6.32-4-pve kernel.
We have tested with 2.6.32-6-pve kernel and that doesn't work anymore.
We have tested with the new 2.6.32-7-pve kernel and that doesn't work anymore.

Two questions about this problem :

Is there any problem to "downgrade" all the proxmox cluster to the 2.6.32-4-pve kernel ?
What can we do in order to resolve this issue ?

Regards

tom · Feb 29, 2012

2.6.32-4-pve is not maintained anymore, so you should debug issue with the latest kernel.

dietmar · Feb 29, 2012

lozair said:
What can we do in order to resolve this issue ?

You need to find a way to reproduce the bug, so that we can debug that here.

lozair · Feb 29, 2012

my problem is the error appear only with the oracle rwdiag tool......
It's always reproducible with this tools but you can't install all the oracle products to reproduce this test

I have captured the data sent from the rwdiag tool and resend the data via netcat using the same ip/udp/src/dest port.
If i resend the data back with netcat all work fine

The problem is at the Ethernet/ip/udp packet level and not in the data which is quite logic.

I will attempt to compare the packet structure at the ethernet/ip/udp level to identify differences or see if i can package the rwdiag tool...

dietmar · Feb 29, 2012

Maybe it uses some record-route features on tcp level?

lozair · Mar 1, 2012

comparing packets i can't see differences.......
Just the IP ID field which always set to 0 with oracle tools and random number for netcat.

We have used tcpreplay tool in order to replay the packet captured from oracle tool.
In this case, the IP ID field is set to 0, and all worked fine, the kvm guest receive the packet and the oracle client make a good reply.....

lozair · Mar 1, 2012

OK we have extracted libs and scripts to run the tool alone.
We have tested this extract from another server and the result is the same.
The guest doesn't receive the packet ans is blocked for five minutes.

The probleme, here, is now reproducible.
I can send this archive in order you can test it and see if you encounter the same problem....

What is the procedure ?

Thks for your help

dietmar · Mar 1, 2012

lozair said:
What is the procedure ?

The bug seems to be openvz kernel related. So I guess its best to file a bug at bugzilla.openvz.org

lozair · Mar 1, 2012

dietmar said:
The bug seems to be openvz kernel related. So I guess its best to file a bug at bugzilla.openvz.org

Ok i will attempt to secribe the problem.

could you give me which version of openvz kernel is used in pve2.6.32-4, -6 and -7.

Thks

dietmar · Mar 1, 2012

lozair said:
could you give me which version of openvz kernel is used in pve2.6.32-4, -6 and -7.

# zless /usr/share/doc/pve-kernel-2.6.32-7-pve/changelog.Debian.gz

lozair · Mar 1, 2012

thks,
Could you say me if the following is correct :
the pve2.6.32-4 is not openvz patched
the two others are openvz patched
That's why you think it's a openvz patch problem ?

tom · Mar 1, 2012

2.6.32-4 is Debian Squeeze based kernel with OpenVZ
2.6.32-6/7 is RHEL based kernel with OpenVZ

all details in the /usr/share/doc/pve-kernel-2.6.32....

lozair · Mar 1, 2012

Thanks.

A bug report was submitted on openvz bugzilla.

http://bugzilla.openvz.org/show_bug.cgi?id=2201

dietmar · Mar 1, 2012

What is the MAC address of the KVM VM?

multicast problem after upgrade from 1.8 to 1.9

Member

Proxmox Staff Member

Member

Member

Member

Member

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

We value your privacy