Weird network issue after update to PVE9. Windows VMs can't reach their DC if not running on the same PVE host

acurus

New Member
Sep 13, 2025
1
0
1
Hi everyone!

I've run into a quite weird network issue after updating a PVE cluster from 8.latest to 9.0.9.

The cluster consists of two identical servers (16 core AMD CPU, 96 GB Ram, multiple network cards). The network cards are combined to two bridges, with one bridge used for management, the other for vm networking. Both bridges are simple active/passive bridges connected to simple layer 2 switches. Nothing has changed on the switch side for several months.

The cluster and corresponding VMs were running perfectly fine on PVE 8.latest, but are experiencing the following issue now after the update to PVE 9.0.9:

One VM is a Windows DC, other VMs are Windows 11 clients to that DC.

When both DC and client are running on the same PVE host, everything is working just fine. The clients have network connectivity (tested by pinging external hosts like google.com by name), and can reach and use the DC. The DC is also the DNS server for the clients. It doesn't matter which PVE host runs all the VMs, as long as they run on the same host.

When I start a client on the other PVE host (and it doesn't matter if the Client runs on PVE A and the DC runs on PVE B, or the other way around), the client still has basic network connectivity, but fails to use the DCs logon services. The result is a boot time in the range of 20+ Minutes, logon using domain credentials doesn't work. Logon using local accounts DOES work, and I can verify IP connectivity, eg. ping google.com works just fine, even though the client has to resolve google.com using the DC. Ping to the DC also works, but the eventlog on the client is full of messages stating that the DC is not reachable.

To rephrase the issue:
Unless both Client and DC are running on the same physical host the client has network connectivity (eg. can ping the dc, can do nslookups with the DCs as DNS Server, can ping the outside world (which proves that layer 3/4 connectivity to the DC and WAN Router across physical hosts is working), but failes to use actual windows domain related services on the DC.


Steps I took so far:
- disabled all backup-network-ports on the switches, to force all VM traffic to use only one nic per host, connected to the same switch
- rebooted both PVE hosts
- verified Bridge/VLAN configuration on both PVE Hosts (identical)
- verified VLAN configuration on the switch side (identical)
- verified that there is no PVE firewall on host, network or VM Level active
- tested all combinations of VM to Host: DC and Client both on PVE A (everything works), DC and Client both on PVE B (everything works), DC on PVE A and Client on PVE B (issue!), Client on PVE A and DC on PVE B (issue!)
- tested ping from client to DC when running on different hosts (works)
- tested nslookup from the client with the DC as DNS Server when running on different hosts (works)
- tested ping to both the DC, other VMs and physical hosts in the same network and on the internet using both IP addresses and hostnames on the client when running on different hosts (works)
- tested if the client can lookup the DC via Nltest /dsgetdc:<DomainName> /force when running on different hosts (works like a charm)


Given that the issue only shows up when the VMs are not running on the same host I tend to rule out anything within the VMs.
Given that there where no changes on the switch side, I'd rule them out too.

But I'm at a loss now, and would appreciate any pointers on where to look now :)