Dear Proxmox admins,
I'm faced with a very, very curious issue that I've been debugging for at least 12 hours already without notable success. I'm hoping someone may point me in the right direction.
In order to explain the issue, I think it's necessary to first describe the setup. (All Proxmox machines in this setup are used for testing only.)
When I originally installed the virtual cluster using Proxmox PVE 8.1, there was no Router-VM. I don't remember having had any issues with any of my tests. Then, in order to more closely replicate what I'm planning to build, I set up the Router-VM and at the same time upgraded all nodes to Proxmox PVE 8.3 (with most recent updates per today installed).
So, I'm accessing the Web Console from my Windows 11 PC. I can access the host (10.2.0.93:8006) just fine and without any lags. However, when I try to access one of the nodes of the virtual cluster such as 10.10.10.2:8006, suddenly, I kept experiencing lags. For example, it takes up to 15 seconds before the shell is shown.
Using the browser development tools (F12), I was able to confirm that several web calls take very long.

Specifically, it's the TLS setup that takes long.

Then, I made some very weird findings that I simply can't begin to explain. But I do wish to clarify that at no point and in no combination do I ever experience lags when accessing the physical Proxmox host on 10.2.0.93. My issues only begin when trying to access the virtual nodes in 10.10.10.0/23
Not browser related
I tried private windows, I tried different browsers - the result is always the same.
Resetting NIC solves issue
First, I can rectify the situation by simply resetting the network interface on my Windows PC and rebooting. Now, when I access the Web console for 10.10.10.2, everything works smooth and fast.
Establishing VPN brings it back
However, as soon as I establish ANY kind of VPN connection from this Windows box, even if I close the connection again, I'm faced with the delays mentioned above.
Not just one PC - any Windows PC in the LAN
It gets weirder. Even if I try to access the web console from another Windows PC in my LAN from which a VPN connection was established at any point in the past, there is a lag. Which in my view means that it's not my Windows box per se that has a problem.
Once one PC has an issue, all others have one too
This lagginess that is the result of one "affected computer" accessing the web console, is to some extent universal, by the way. So if I go back to my first Windows PC with the NIC that was reset and had no lag when accessing the web console, I will now have one.
Journal errors
Apart from the general lagginess, here is something specific I can see in the Web console, and also in the journal of the node.
The weirdness continues. If I use my Linux notebook (Ubuntu 24 desktop), attached to the same LAN (same switch, even same cable) as my Windows PC to access the Web console, I don't experience any lags, ever!
Conclusion
I can make a preliminary, purely anecdotal conclusion at best. Since I can always access the web console fine from Linux, I'd say it's rather unlikely that there's a general networking issue - all PCs, Linux or Windows, get an address/gateway/dns from the same DHCP server. I'm not using custom routes or firewalls on any of the PCs (all turned off for testing). As far as the main router (10.2.0.1) is concerned, all LAN interfaces are part of the same bridge, so it's just switching (layer2).
Claiming that a combination of PVE and Windows is a general issue would not be true, either. After all, I can access the physical Proxmox host fine (although it's a single host and not part of a cluster unlike the virtual nodes). But since I only have these issues when connecting to the web console from Windows but never from Linux, the OS does seem to play some role in it all.
It seems that the issues first arised after introducing the Router VM, so maybe it's responsible. But its configuration could not be simpler (just routing between ether4 10.2.0.0/23 and the bridge for 10.10.10.0/24, no filtering, nothing special). Rebooting it does not have any effect either. It always works from Linux and it seems to work from Windows, too, just as long as no VPN was established.
This last part has me particularly confused as I see no logic in it whatsoever - a roadwarrior type VPN should not affect the router or other devices in the LAN in any way, particularly not after the connection was closed. Which is why I assume that this is just a symptom of the issue and not its origin. But I guess this is where my capabilities are at a limit.
Vic
I'm faced with a very, very curious issue that I've been debugging for at least 12 hours already without notable success. I'm hoping someone may point me in the right direction.
In order to explain the issue, I think it's necessary to first describe the setup. (All Proxmox machines in this setup are used for testing only.)
- LAN: 10.2.0.0/23
- Main router (Mikrotik): 10.2.0.1, with a route to 10.10.10.0/23 via 10.2.0.25
- Proxmox host: 10.2.0.93 (vmbr0, default bridge)
- Router-VM (Mikrotik) on Proxmox host with 4 vNICs: ether4 connected to vmbr0 (10.2.0.25), ether1-3 in a bridge (10.10.10.1)
- Proxmox virtual Nodes 1-3 in a cluster, connected to the Router-VM through separate bridges (brnode1-3, only layer 2) on the Proxmox host.
- No firewalls: For purposes of testing, all firewalls within the LAN and in all VMs were turned off.
When I originally installed the virtual cluster using Proxmox PVE 8.1, there was no Router-VM. I don't remember having had any issues with any of my tests. Then, in order to more closely replicate what I'm planning to build, I set up the Router-VM and at the same time upgraded all nodes to Proxmox PVE 8.3 (with most recent updates per today installed).
So, I'm accessing the Web Console from my Windows 11 PC. I can access the host (10.2.0.93:8006) just fine and without any lags. However, when I try to access one of the nodes of the virtual cluster such as 10.10.10.2:8006, suddenly, I kept experiencing lags. For example, it takes up to 15 seconds before the shell is shown.
Using the browser development tools (F12), I was able to confirm that several web calls take very long.

Specifically, it's the TLS setup that takes long.

Then, I made some very weird findings that I simply can't begin to explain. But I do wish to clarify that at no point and in no combination do I ever experience lags when accessing the physical Proxmox host on 10.2.0.93. My issues only begin when trying to access the virtual nodes in 10.10.10.0/23
Not browser related
I tried private windows, I tried different browsers - the result is always the same.
Resetting NIC solves issue
First, I can rectify the situation by simply resetting the network interface on my Windows PC and rebooting. Now, when I access the Web console for 10.10.10.2, everything works smooth and fast.
Establishing VPN brings it back
However, as soon as I establish ANY kind of VPN connection from this Windows box, even if I close the connection again, I'm faced with the delays mentioned above.
Not just one PC - any Windows PC in the LAN
It gets weirder. Even if I try to access the web console from another Windows PC in my LAN from which a VPN connection was established at any point in the past, there is a lag. Which in my view means that it's not my Windows box per se that has a problem.
Once one PC has an issue, all others have one too
This lagginess that is the result of one "affected computer" accessing the web console, is to some extent universal, by the way. So if I go back to my first Windows PC with the NIC that was reset and had no lag when accessing the web console, I will now have one.
Journal errors
Apart from the general lagginess, here is something specific I can see in the Web console, and also in the journal of the node.
- When everything works fine, then switching between shells of different nodes are always quit with Status OK for each shell that I leave.
- When I have the issue, then switching between shells does not close the task for the shells that were opened, in the Status field there is a progress sign that keeps turning.
- These shell tasks remain open and may, after a while, close with the following error:
Code:
TASK ERROR: command '/usr/bin/termproxy 5900 --path /nodes/proxtest2 --perm Sys.Console -- /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxtest2' -o 'UserKnownHostsFile=/etc/pve/nodes/proxtest2/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' -t root@10.10.10.2 -- /bin/login -f root' failed: received interrupt
The weirdness continues. If I use my Linux notebook (Ubuntu 24 desktop), attached to the same LAN (same switch, even same cable) as my Windows PC to access the Web console, I don't experience any lags, ever!
Conclusion
I can make a preliminary, purely anecdotal conclusion at best. Since I can always access the web console fine from Linux, I'd say it's rather unlikely that there's a general networking issue - all PCs, Linux or Windows, get an address/gateway/dns from the same DHCP server. I'm not using custom routes or firewalls on any of the PCs (all turned off for testing). As far as the main router (10.2.0.1) is concerned, all LAN interfaces are part of the same bridge, so it's just switching (layer2).
Claiming that a combination of PVE and Windows is a general issue would not be true, either. After all, I can access the physical Proxmox host fine (although it's a single host and not part of a cluster unlike the virtual nodes). But since I only have these issues when connecting to the web console from Windows but never from Linux, the OS does seem to play some role in it all.
It seems that the issues first arised after introducing the Router VM, so maybe it's responsible. But its configuration could not be simpler (just routing between ether4 10.2.0.0/23 and the bridge for 10.10.10.0/24, no filtering, nothing special). Rebooting it does not have any effect either. It always works from Linux and it seems to work from Windows, too, just as long as no VPN was established.
This last part has me particularly confused as I see no logic in it whatsoever - a roadwarrior type VPN should not affect the router or other devices in the LAN in any way, particularly not after the connection was closed. Which is why I assume that this is just a symptom of the issue and not its origin. But I guess this is where my capabilities are at a limit.
Vic