VMs and LXC cannot be adressed by DNS after reboot / migration, IP still works

maxim.webster

Active Member
Nov 12, 2024
270
130
43
Germany
Hi,

I am facing an issue that might be related to Proxmox, but also to my UniFi network equipment.

I have a 3-node-cluster (pve-manager/9.0.11/3bf5476b8a4699e2 (running kernel: 6.14.11-4-pve)) running several VMs and LXC, using HA/replication for failover. The clients use DHCP to get an IPv4 address from my UniFi Controller (Dream Machine Pro). Also - the DNS is located on the controller (built-in). Whenever I reboot a guest or it get's migrated to another node, it cannot be contacted from a client using it's FQDN or host name for several minutes. However, since the IP address assigned via DHCP stays the same, it can be contacted by IP anytime. Also, networking on the guest works.

Did anybody face the same issue and may provide a hint where to look? The behaviour is not only anoying, but contradicts the purpose of "high availability".

Additional info:
  • the cluster nodes have fixed IPs from a dedicated VLAN 10. The CIDR is 192.168.10.0/24
  • the clients use DHCP for dynamic IPs from another VLAN 20. The CIDR is 192.168.20.0/24

Sample network configuration of one cluster node

Code:
auto lo
iface lo inet loopback

iface enp2s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.3/24
        gateway 192.168.10.1
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094


Sample configuration of one VM

Code:
name: gary
net0: virtio=BC:24:11:0D:E3:BE,bridge=vmbr0,firewall=1,tag=20

UniFi port config for cluster node ports


1760719107846.png


Settings for VLAN 20 ("HOME")


1760719391650.png
 
  • Like
Reactions: jwhy
Any input on this? The issue is keeping me from realizing a high-available MariaDB Galera cluster. Everything is working, using an LXC container on each of my 3 Proxmox Nodes (non-HA) and an additional LXC (replicated, HA) running HAProxy. Whenever the HAProxy LXC migrates to another node, it can not be contacted by DNS Name for a certain period of time.

Not sure, if this is actual an Proxmox, Guest OS oder UniFi-issue, though.
 
Hi,

Can you give an example of IP address of your server and your client ?

Have you fixed the IP and DNS of your server (vlan10as i understand) on the Ubiquiti side ?

Best regards,
 
The Containers are all bound to the VLAN aware bridge vmbr0 and have VLAN tag 20 assigned to it.

They use DHCP and there are no address reservations in the UniFi controller for them. All of them receive an adress from VLAN20 network 192.168.20.0/24.
 
Hi,

When this happened, have you tested if the DNS resolution still work ?

Note : for servers I would definitely set IP to static IP instead of DHCP.

Best regards,
 
I digged into this, but did not find the cause. What I discovered:
I have two LXC containers acting as round-robin-proxies for internal services:
  • "db" uses HAProxy to distribute traffic to 3 MariaDB Galera nodes (LXC containers, too)
  • "proxmox" uses HAProxy to distribute traffic to my 3 Proxmox nodes (dedicated machines)

When migrating "db" to another Proxmox nodes, everything continues to work (beside a minor service interruption, because migration is using Restart Mode).

When migrating "proxmox" to another Proxmox node, it takes couple of minutes (> 10min) to make DNS resolution to work again. Accessing container by IP works all the time.

Things I figured out:

  • Container config of "db" and "proxmox" is identical with the exception of hostname, MAC-address, disks and description
  • "db" and "proxmox" use an identical software stack, recent Alpine Linux, same set of software and versions, HAProxy as main sevice

When "proxmox" migrates, it's entry disappears from the ARP table. After accessing it by IP, the entry reappears in the ARP table. When doing "nslookup" and "ping" immediately after, nslookup does report the guests IP, but ping fails to resolve the DNS name (Client is running Windows 11 German):


Code:
Standardserver:  unifi.lan.internal
Address:  192.168.20.1

> proxmox
Server:  unifi.lan.internal
Address:  192.168.20.1

Name:    proxmox
Address:  192.168.20.82

2 secs after

Code:
> ping proxmox

Ping-Anforderung konnte Host "proxmox" nicht finden. Überprüfen Sie den Namen, und versuchen Sie es erneut.

(No such host)

Yes, it could be a Windows related DNS issue, but why do identical containers cause different results?