multicast problem after upgrade from 1.8 to 1.9

lozair

Member
Nov 4, 2008
89
0
6
Hi,
We have Oracle Application Server on our KVM proxmox vm.
This VM is connected to a bridge on the proxmox host (interface bonding + vlan tagging).
Our vm should receive multicast packets in order to work with a central database server.

All worked fine with proxmox 1.8 but since migration to 1.9 vm does not receive multicast packets anymore.

Looking traffic with tcpdump, we can see the following :
  1. The multicat packet is sent by a server (not a proxmox vm, it's a physical server on the same LAN than the proxmox VMs)
  2. The multicast packet is provided by ours physicals switchs to the proxmox bridge (we can capture it on the bridge on the proxmox host)
  3. The multicast packet is never forwarded to the vm network interface (we can't capture it on the tap guest interface)

I googling a lot but nothing resolve my problem....

Here is my bridge config :
bridge name bridge id STP enabled interfaces
vmbr0 8000.a4badb0d8627 no eth0
vmbr3 8000.a4badb0d8629 no bond0.3 tap212i3d0

The tap212i3d0 is the guest interface.

Any advice/help would be great

Thks for your help
 
Does it help if you set

net.ipv4.conf.all.rp_filter = 2

in /etc/sysctl.d/vzctl.conf (please reboot the system top activate those settings).
 
Same problem.
We have rebooted the systeme but nothing on the guest interface.....

We have disabled iptables, ebtables is not installed.
We have tunned bridge using this parameters following bridge howto :

# Test bridge/multicast
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

Some strange behaviour when i do a sysctl -a to list kernel parameters, the following message appear :
error: permission denied on key 'vm.compact_memory'
error: permission denied on key 'net.ipv4.route.flush'
error: permission denied on key 'net.ipv4.route.flush'


We have tested on old Xen host (it use bridge like in proxmox), if i make a guest on the same LAN than the multicast sender, i get the multicast message at the guest level.....
 
More information about my problem.

In order to exclude a problem in the guest, we have changed the network card driver from virtio to rtlxxx but no change.


We make some test and a strange behaviour appear.
Consider the following :
SENDER : is the multicast packet sender
CLIENT : is the multicast client (the guest proxmox vm machine)

On SENDER, we send packets with netcat on port 14021 : nc -u -p 14021 228.5.6.7 14021
All is fine, we receive the packet on the CLIENT, so the bridge forward it.
We can repeat this test for few minutes, all is fine

On SENDER, we send packets with the oracle rwdiag.
The packet caracteristic is the same than the netcat packet. The data in the packet is different.
At the bridge level, we receive the packet.
At the guest level, all is bad, we don't receive any packet........

On SENDER we send again packets with netcat on port 14021 : nc -u -p 14021 228.5.6.7 14021
At the bridge level, we receive the packet.
At the guest level, all is bad, we don't receive any packet........

We wait ~5 minutes

On SENDER we send again packets with netcat on port 14021 : nc -u -p 14021 228.5.6.7 14021
At the bridge level, we receive the packet.
At the guest level, all is ok !!!!

If we send again with the oracle rwdiag client the bridge is broken again for 5 minutes.........

Another info is the bridge do not forward the multicast packet 228.5.6.7 for 5 minutes but in the same time i can send any packets with another multicast address and it's forwarded to the bridge...
Example :
nc -u -p 14021 228.5.6.7 14021 : packet don't reach the guest
nc -u -p 14021 228.5.6.6 14021 : packet reach the guest


It seems the bridge "block/can't" respond to this multicast packet.
I'm looking what this 5 minutes (delay/timeout) could coresspond but for now no result

thanks for your help
 
Last edited:
In order to resolve the problem we have disabled bonding at the proxmox host level.
We have disabled Multi Link Trunk at the switch level.
always the same problem...
 
Perhaps a problem at the kernel level.....
we run the 2.6.32-6-pve kernel.
Can anyone confirm that i can revert to 2.6.32-4-pve kernel just rebooting the server and don't modifying other pve packages ?
I'm running only kvm guests.

Thks
 
We have "identified" the problem.

All work fine with 2.6.32-4-pve kernel.
We have tested with 2.6.32-6-pve kernel and that doesn't work anymore.
We have tested with the new 2.6.32-7-pve kernel and that doesn't work anymore.

Two questions about this problem :
  1. Is there any problem to "downgrade" all the proxmox cluster to the 2.6.32-4-pve kernel ?
  2. What can we do in order to resolve this issue ?

Regards
 
2.6.32-4-pve is not maintained anymore, so you should debug issue with the latest kernel.
 
my problem is the error appear only with the oracle rwdiag tool......
It's always reproducible with this tools but you can't install all the oracle products to reproduce this test :)

I have captured the data sent from the rwdiag tool and resend the data via netcat using the same ip/udp/src/dest port.
If i resend the data back with netcat all work fine :(
The problem is at the Ethernet/ip/udp packet level and not in the data which is quite logic.

I will attempt to compare the packet structure at the ethernet/ip/udp level to identify differences or see if i can package the rwdiag tool...
 
comparing packets i can't see differences.......
Just the IP ID field which always set to 0 with oracle tools and random number for netcat.

We have used tcpreplay tool in order to replay the packet captured from oracle tool.
In this case, the IP ID field is set to 0, and all worked fine, the kvm guest receive the packet and the oracle client make a good reply.....
 
OK we have extracted libs and scripts to run the tool alone.
We have tested this extract from another server and the result is the same.
The guest doesn't receive the packet ans is blocked for five minutes.

The probleme, here, is now reproducible.
I can send this archive in order you can test it and see if you encounter the same problem....

What is the procedure ?

Thks for your help
 
The bug seems to be openvz kernel related. So I guess its best to file a bug at bugzilla.openvz.org

Ok i will attempt to secribe the problem.

could you give me which version of openvz kernel is used in pve2.6.32-4, -6 and -7.


Thks
 
thks,
Could you say me if the following is correct :
the pve2.6.32-4 is not openvz patched
the two others are openvz patched
That's why you think it's a openvz patch problem ?
 
2.6.32-4 is Debian Squeeze based kernel with OpenVZ
2.6.32-6/7 is RHEL based kernel with OpenVZ

all details in the /usr/share/doc/pve-kernel-2.6.32....
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!