VM randomly going offline and traffic catch by host

Jun 11, 2019
41
2
13
52
Hi,

I'm having issue with network on an host based on ProxMox VE version 6.4-13.

Very often one of the VM (is a qemu virtual machine) seems to be offline; in reality the traffic is not forwarded to the VM but is routed to the host machine.

My situation:

- The datacenter where the machine is hosted assigned 7 ip addresses where only four can be used (others are for gateway, ILO port, ...); these addresses are: 82.195.231.243, 82.195.231.244, 82.195.231.245, 82.195.231.246; the network is attached to eno1 port of the host; there is no redundancy so the second eth port is not connected; 82.195.231.243 is assigned to the proxmox web interface (basically the host machine); the remaining are mapped to different VMs;

- A linux bridge vmbr0 has been created and dispatch all the traffic from the WAN; all VMs that are exposed to WAN has one eth port attached to vmbr0;

- the issue is the virtual machine #102 where net0 eth port is connected to vmbr0 and it's IP is 82.195.231.245; this vm run debian 11.

When vm#102 appears offline:
  • no traffic arrive on that machine;
  • running an nmap on the IP address assigned to VM#102 (82.195.231.245) the open ports shown are the ones open on the host machine! (double checked connecting via ssh and the host machine answered); so basically the traffic are not routed to the VM but is catch by the host machine;
  • on the VM#102 console running a ping 1.1.1.1, the first five / six packets are lost then replies are received; after that the VM are back online and answer on it's IP; VM then goes offline again in a variable time (ranging from few minutes to several hours);
  • running a brctl showstp vmbr0 all ports are in forwarding state;
  • no strange / indicative messages appears on dmesg.

This issue appeared after I converted the vm#102 from an lxc to a qemu VM; (it was not a real conversion: I simply created a new qemu VM and assigned to the indicated IP address deleting the old lxc)

Any clue to diagnose the issue is appreciated.
 
That sounds like an issue with either a duplicate IP or duplicate MAC address.
Especially if it works for some time after pinging from inside VM 102 to the outside.
 
That sounds like an issue with either a duplicate IP or duplicate MAC address.
Especially if it works for some time after pinging from inside VM 102 to the outside.
I changed now the MAC address with a new random one and we will see; IP address has been assigned by the hosting provider and I cannot change them; I have just 12 machine on this (unique) host so it's quite easy to keep track of internally assigned IP addresses; just to test I shut down the malfunctioning VM and tried to ping it... no answer. We will see in the meanwhile!
 
That sounds like an issue with either a duplicate IP or duplicate MAC address.
Especially if it works for some time after pinging from inside VM 102 to the outside.
No way, after some hours after changing the NIC MAC address, issue reappeared.

Can you suggest me a checklist? Otherwise the only option I have is to recreate the VM from scratch and hope...
 
If possible, please capture the traffic with tcpdump:
tcpdump -env -s0 -w <host>.pcap
Run this on the host.

Be aware though that this will create a lot of logging if it takes that long for the issue to appear.
 
If possible, please capture the traffic with tcpdump:
tcpdump -env -s0 -w <host>.pcap
Run this on the host.

Be aware though that this will create a lot of logging if it takes that long for the issue to appear.
I'm organizing the test.
And when I get the tcpdump file what I have to do? Send to you for study or anything else?
 
You can then check it with wireshark your self, or attach the file here.
Check for arp and IP addresses.

If you have traffic you don't want to release publically, you can also upload the file somewhere and send me the link at m.limbeck@proxmox.com.
 
You can then check it with wireshark your self, or attach the file here.
Check for arp and IP addresses.

If you have traffic you don't want to release publically, you can also upload the file somewhere and send me the link at m.limbeck@proxmox.com.
I'm not concerned about the traffic itself, it's all encrypted but the file is quite big: > 4Mb already gzipped and the forum engine reject it. How big is your mailbox?
 
If it is less than 20MB it should be fine, I think.
But you could also upload it somewhere and simply provide a download link, either here or via mail.
 
It seems there are duplicate IPs configured, as can be seen when filtering your trace with `arp`.
Packet No. 3372 and 3373 are one example of this.
This happens for 82.195.231.246, 82.195.231.245 and 82.195.231.244:
Code:
3371    131.561139    JuniperN_df:e4:c7    Broadcast    ARP    60    Who has 82.195.231.246? Tell 82.195.231.241
3372    131.561143    HewlettP_4c:f0:3c    JuniperN_df:e4:c7    ARP    42    82.195.231.246 is at 94:40:c9:4c:f0:3c
3373    131.561278    9a:0e:e4:75:82:d2    JuniperN_df:e4:c7    ARP    42    82.195.231.246 is at 9a:0e:e4:75:82:d2
5622    230.960270    JuniperN_df:e4:c7    Broadcast    ARP    60    Who has 82.195.231.245? Tell 82.195.231.241
5623    230.960274    HewlettP_4c:f0:3c    JuniperN_df:e4:c7    ARP    42    82.195.231.245 is at 94:40:c9:4c:f0:3c
5624    230.960501    b2:d1:17:36:58:06    JuniperN_df:e4:c7    ARP    60    82.195.231.245 is at b2:d1:17:36:58:06
6477    314.461279    JuniperN_df:e4:c7    Broadcast    ARP    60    Who has 82.195.231.244? Tell 82.195.231.241
6478    314.461284    HewlettP_4c:f0:3c    JuniperN_df:e4:c7    ARP    42    82.195.231.244 is at 94:40:c9:4c:f0:3c
6479    314.461486    3a:9d:ae:41:e6:56    JuniperN_df:e4:c7    ARP    42    82.195.231.244 is at 3a:9d:ae:41:e6:56
 
It seems there are duplicate IPs configured, as can be seen when filtering your trace with `arp`.

Dear Mira, we have a problem, or better: A question.

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 94:40:c9:4c:f0:3c brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 94:40:c9:4c:f0:3d brd ff:ff:ff:ff:ff:ff
4: enp1s0f4u4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0a:25:18:01:0f:f1 brd ff:ff:ff:ff:ff:ff
5: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 94:40:c9:4c:f0:3c brd ff:ff:ff:ff:ff:ff
    inet 82.195.231.243/29 brd 82.195.231.247 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet 82.195.231.244/29 brd 82.195.231.247 scope global secondary vmbr0
       valid_lft forever preferred_lft forever
    inet 82.195.231.245/29 brd 82.195.231.247 scope global secondary vmbr0
       valid_lft forever preferred_lft forever
    inet 82.195.231.246/29 brd 82.195.231.247 scope global secondary vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::9640:c9ff:fe4c:f03c/64 scope link
       valid_lft forever preferred_lft forever

The three address you mentioned are three of the seven address assigned to my hosting provider; they are automatically assigned to the machine via provider's DHCP to the eno1 port. I then have to route the traffic coming to each of the IP address to different vm running on proxmox host; the most logical solution for me appeared to assign the same IP address to the VM but seems wrong.

Which solution do you suggest?
 
That depends on your provider. They should provide a network configuration guide if required.
Have you tried simply setting the IPs in the VMs instead of the bridge on the host?
 
That depends on your provider. They should provide a network configuration guide if required.
Have you tried simply setting the IPs in the VMs instead of the bridge on the host?
Thanks for your answer. The only IP I set on the bridge is the host main IP, that is different from the VM IP; I think (but I'm not so sure...) that the secondary IPs are automatically assigned by the provider via DHCP. Unfortunately their customer support is very slow so I have to be sure that my config is correct before pinging them...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!