Ansible/Terraform breaks on VLAN Interfaces

Unpeeled5565

New Member
Feb 13, 2026
4
0
1
I am looking into a move from VMWare > Proxmox and am stumped by what I believe to be a bug that completely breaks my automation tools when VLANs are configured for the management interface.

When the Proxmox host has management IPs on VLAN interfaces (either VLAN-aware bridge or traditional VLAN interfaces), certain applications fail while others work normally:

FAILS:

* Ansible (Python module execution)
* Proxmox API (HTTPS requests timeout)
* Terraform (relies on Proxmox API)

WORKS:

* SFTP/SCP file transfers
* SSH (authentication, simple commands, even manual piping)
* Ping
* General network connectivity including web UI

The connection establishes successfully (SSH auth works, TLS handshake completes), but then specific operations timeout:

Ansible:

Code:
fatal: [host]: UNREACHABLE! =>
msg: 'Data could not be sent to remote host'

But this works:

Code:
ansible host -m raw -a "echo test"     # SUCCESS
ansible host -m ping                   # FAILS (requires Python module transfer)


Proxmox API: Hangs/times out after TLS handshake completes

Code:
curl -k https://10.83.2.40:8006/api2/json/version


But manual SFTP works fine:

Code:
scp file.txt root@10.83.2.40:/tmp/

SSH Debug Output (from Ansible):

Code:
debug2: channel 2: read failed rfd 6 maxlen 32768: Broken pipe
Read from remote host 10.83.2.40: Operation timed out
client_loop: send disconnect: Broken pipe


Configurations Tested (ALL FAIL with Ansible/API)

VLAN-aware bridge:


Code:
auto vmbr0
iface vmbr0 inet manual
bridge-ports nic1
bridge-vlan-aware yes
bridge-vids 2-4094
auto vmbr0.2
iface vmbr0.2 inet static
address 10.83.2.40/24



Traditional VLAN interfaces:

Code:
auto nic1.2
iface nic1.2 inet static
address 10.83.2.40/24
auto vmbr0
iface vmbr0 inet manual
bridge-ports nic1


Forum-recommended config with custom names:


Code:
auto mgmt
iface mgmt inet static
address 10.83.2.40/24
vlan-id 2
vlan-raw-device vmbr0

All produce the same failures.

Simple bridge (NO VLANs) - WORKS:



Code:
auto vmbr0
iface vmbr0 inet static
address 10.83.2.40/24
bridge-ports nic1


Note: The above is still on a VLAN at the switch side, It's just put on an access port/native VLAN.

Testing:

**Kernel 6.17.9-1-pve:** Bug present
**Kernel 6.17.2-1-pve:** Bug present
**Hardware offloading:** Disabled TSO/GSO/GRO - no change
**Multiple hosts:** Reproduced on different Proxmox hosts with different NICs - same result
**Manual SSH piping:** Works fine (can pipe Python scripts and execute with sudo)
**SFTP transfers:** Work perfectly

The only solution is to basically avoid VLAN trunking for host management which really shouldn't be the case.

I found the following which seemed to be the same issue but unfortunately did not resolve it for me:

* VLAN-aware configuration kills TCP handshake
* Connections to PVE in VLAN timeout

Packet Capture

Packet capture data between working and non working setup using the same ansible commands, with the same IP, this capture was done on the PVE host itself:

Working configuration (no VLANs):

Code:
10.83.2.40.22 > client: Flags [P.], seq [...], length 132  < Server sends data
client > 10.83.2.40.22: Flags [.], ack 132                  <  Client ACKs data


Failing configuration (with VLANs):

Code:
client > 10.83.2.40.22: Flags [.], ack 1, length 0  < Only ACKs, no data
client > 10.83.2.40.22: Flags [.], ack 1, length 0
client > 10.83.2.40.22: Flags [.], ack 1, length 0
client > 10.83.2.40.22: Flags [.], ack 1, length 0

With VLANs configured, the Proxmox host never seems to send any outbound data packets. The TCP connection establishes successfully (SYN/ACK works), and ACK packets flow bidirectionally, but actual data payloads from the server are being dropped/blocked by the kernel.

This probably? explains why:

* Small packet operations work (SSH auth, ping)
* Large data transfers fail (Ansible, API)

Environment:

Test Host 1:

Code:
Proxmox VE: 9.1.5
Kernel: 6.17.2-1-pve and 6.17.9-1-pve
NIC: Intel I226-V (2.5GbE) - igc driver


Test Host 2:

Code:
Proxmox VE: 9.1.5
Kernel: 6.17.2-1-pve and 6.17.9-1-pve
NIC: Intel x710 (10GbE)

I'm kind of lost now, hoping anyone here knows more than me and can point me in the right direction.

At the moment this seems to be some kind of kernel bug where outbound TCP data packets are silently dropped on VLAN interfaces (both VLAN-aware bridges and traditional VLAN interfaces), while connection management packets (SYN, ACK, FIN) pass through normally. Though I would be very happy to hear I am just being an idiot.
 
This probably? explains why:

* Small packet operations work (SSH auth, ping)
* Large data transfers fail (Ansible, API)

This often occurs when there are MTU issues somewhere along the path - did you double-check your MTU config across the whole path? You can always check via the ping command:

Code:
# 9000 MTU
ping -M do -s 8972 {target host}

# 1500 MTU
ping -M do -s 1472 {target host}

-M do mean don't fragment, and -s specifies the size. 28 needs to be subtracted because of the overhead. Could you maybe check pinging with different sizes and see if there's a "breaking point"?
 
Okay I seem to have more info now. Thanks @shanreich and @bbgeek17, I looked into the MTU and don't think there's an issue there, I've never changed MTU or enabled jumbo frames or anything like that, I still gave it a go, and had the same result.

What I have found out though, is that this issue only seems to happen when I have multiple VLAN interfaces with addresses.

During MTU testing all packet sizes work fine (1472, 1468, 1400 bytes with -D flag), so MTU is not the issue I don't think.

The bug only appears when multiple VLAN interfaces are configured on the same bridge:

Reproduction:
1. Single VLAN interface (vmbr0.2): - Ansible and API work perfectly
2. Add second VLAN (vmbr0.4): - Ansible immediately fails with "Data could not be sent to remote host"
3. Remove second VLAN: - Everything works again

This is 100% reproducible, the failure happens instantly when bringing up the second VLAN interface. I haven't tested on my second test host just yet but will in a bit. With multiple VLAN interfaces configured I can still access the GUI/SSH to any of the management address, and Ansible fails to any.

Config that FAILS:
Code:
auto lo
iface lo inet loopback

iface nic1 inet manual

iface nic0 inet manual

auto vmbr0
iface vmbr0 inet manual
    bridge-ports nic1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr0.2
iface vmbr0.2 inet static
address 10.83.2.40/24

auto vmbr0.4
iface vmbr0.4 inet static
address 10.83.4.40/24


Config that WORKS:
Code:
auto lo
iface lo inet loopback

iface nic1 inet manual

iface nic0 inet manual

auto vmbr0
iface vmbr0 inet manual
    bridge-ports nic1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr0.2
iface vmbr0.2 inet static
address 10.83.2.40/24


Both interfaces show MTU 1500, no fragmentation issues.
 
Last edited:
Can you post the output of ip a / ip r in both states?
 
Tested Open vSwitch to see if the bug was specific to Linux bridge or if I'd get different results, I did not.

FAILS - OVS with Multiple VLANs

Code:
auto lo
iface lo inet loopback

auto nic1
iface nic1 inet manual
ovs_bridge vmbr0
ovs_type OVSPort

auto vmbr0
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_ports nic1 vlan2 vlan4

auto vlan2
iface vlan2 inet static
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=2
address 10.83.2.40/24
gateway 10.83.2.1

auto vlan4
iface vlan4 inet static
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=4
address 10.83.4.40/24



OVS Status:
Code:
ovs-vsctl show
103e7bf8-9838-484a-a0f2-ca127e58e468
Bridge vmbr0
Port vlan4
tag: 4
Interface vlan4
type: internal
Port vmbr0
Interface vmbr0
type: internal
Port nic1
Interface nic1
Port vlan2
tag: 2
Interface vlan2
type: internal
ovs_version: "3.5.0"



Ansible Test Result:
Code:
$ ansible pve01.lcy.muffn.io -m ping

PLAY [Ansible Ad-Hoc] **********************************************************

TASK [ping] ********************************************************************
fatal: [pve01.lcy.muffn.io]: UNREACHABLE! => changed=false
msg: 'Data could not be sent to remote host "10.83.2.40". Make sure this host can be reached over ssh: mux_client_request_session: read from master failed: Broken pipe'
unreachable: true

PLAY RECAP *********************************************************************
pve01.lcy.muffn.io         : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0



WORKS - OVS with Single VLAN

Configuration (vlan4 removed):

Code:
auto lo
iface lo inet loopback

auto nic1
iface nic1 inet manual
ovs_bridge vmbr0
ovs_type OVSPort

auto vmbr0
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_ports nic1 vlan2

auto vlan2
iface vlan2 inet static
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=2
address 10.83.2.40/24
gateway 10.83.2.1



Ansible Test Result:
Code:
$ ansible pve01.lcy.muffn.io -m ping

PLAY [Ansible Ad-Hoc] **********************************************************

TASK [ping] ********************************************************************
ok: [pve01.lcy.muffn.io]

PLAY RECAP *********************************************************************
pve01.lcy.muffn.io         : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
 
Can you post the output of ip a / ip r in both states?

WORKING STATE - Single VLAN (vmbr0.2 only)

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
3: nic1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
link/ether 00:e0:4c:5b:38:b3 brd ff:ff:ff:ff:ff:ff
21: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:e0:4c:5b:38:b3 brd ff:ff:ff:ff:ff:ff
22: vmbr0.2@vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:e0:4c:5b:38:b3 brd ff:ff:ff:ff:ff:ff
inet 10.83.2.40/24 scope global vmbr0.2

Code:
default via 10.83.2.1 dev vmbr0.2 proto kernel onlink
10.83.2.0/24 dev vmbr0.2 proto kernel scope link src 10.83.2.40


FAILING STATE - Multiple VLANs (vmbr0.2 + vmbr0.4)

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
3: nic1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
link/ether 00:e0:4c:5b:38:b3 brd ff:ff:ff:ff:ff:ff
21: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:e0:4c:5b:38:b3 brd ff:ff:ff:ff:ff:ff
22: vmbr0.2@vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:e0:4c:5b:38:b3 brd ff:ff:ff:ff:ff:ff
inet 10.83.2.40/24 scope global vmbr0.2
23: vmbr0.4@vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:e0:4c:5b:38:b3 brd ff:ff:ff:ff:ff:ff
inet 10.83.4.40/24 scope global vmbr0.4



Code:
default via 10.83.2.1 dev vmbr0.2 proto kernel onlink
10.83.2.0/24 dev vmbr0.2 proto kernel scope link src 10.83.2.40
10.83.4.0/24 dev vmbr0.4 proto kernel scope link src 10.83.4.40

Note: The machine I am connecting from is on VLAN4 so this is pure L2 when testing against that address.