Webserver VMs with HTTPS loading incredibly slow

ajtatum

New Member
Apr 20, 2023
23
0
1
Fairfax, VA
I'm moving from Windows Hyper-V to Proxmox and was excited, but I'm noticing that the websites hosted by these VMs (running Ubuntu Jammy) are loading incredibly slow. So much so that often on the first page load, I get an error 522 from Cloudflare (the sites are proxied through them) and then if I refresh the page it loads up... most of the time. Sometimes it takes a couple refreshes.

I'm new to Proxmox, so I've tried to figure this out and read about different performance tweaks and whatnot... but this issue seems to be occurring a lot when you search the forums and there's no clear answer. I do believe it's related to HTTPS as I'm running Home Assistant off it and that's HTTP but I'm using Nabu Casa to access it remotely, and that loads up REALLY quick.

My host machine is a new Dell Optiplex 7000 with an Intel I7-12700 processor, 128GB of ram, and currently on a 2TB PCIe 5 NVME with (2) 10G networking ports (have a 10G switch, Synology on 10G, but my outgoing internet is 1gb).

So, one VM, for example, is for Cloudron.
  • Memory: 12gb RAM
  • Processor: 4 (2 sockets, 2 cores) [host,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+aes] [numa=1]
  • Bios: SeaBIOS
  • Machine: i440fx
  • SCSI Controller: VirtIO SCSI Single
  • Hard Disk: local-2tb:vm-106-disk-0,aio=io_uring,cache=writeback,discard=on,iothread=1,size=128G,ssd=1
  • Networking Device: virtio=00:00:8b:c3:95:3b,bridge=vmbr0,mtu=1,queues=4
  • CPU Usage: 1.57% (peaks at 25% here and there, perhaps due to something with HTTPS?)
  • Memory: 26.20%
The PVE is at an average of 1-3% utilized, but there are spikes up to 14% that correlate with the VMs spikes. Server load averaeg is under 3.

This is my first post, so I apologize if I didn't post everything needed to help me debug this issue. If you can offer any assistance or let me know if you need any additional details, please let me know. Otherwise, I feel like I've made a huge mistake as these VMs (some are like Cloudron, others are just running either Caddy or NGINX with either Node or PHP apps, but are configured similarly).

(Oh, I forgot to mention that there are no issues internally. Meaning, if I try to access Cloudron via internal IP address, it loads up super quick. However, as soon as I use the external web address, I experience the performance hit.)
 
Oh, for the PVE:

Kernel Version: Linux 6.2.9-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.9-1 (2023-03-31T10:48Z)
PVE Manager Version: pve-manager/7.4-3/9002ab8a

Package Versions:
proxmox-ve: 7.4-1 (running kernel: 6.2.9-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-6.2: 7.4-1
pve-kernel-5.15: 7.4-1
pve-kernel-6.2.9-1-pve: 6.2.9-1
pve-kernel-6.2.6-1-pve: 6.2.6-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
 
I found this article about AES-NI and OpenSSL and I'm not sure how to interpret the results, but it seems like something is off:

I ran grep -m1 -o aes /proc/cpuinfo and that produced "aes" on two lines.

I then ran openssl speed -elapsed aes-128-cbc

That produced:

Bash:
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128 cbc for 3s on 16 size blocks: 69972918 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 18208190 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 4577232 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 1151366 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 143585 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 16384 size blocks: 71892 aes-128 cbc's in 3.00s
OpenSSL 1.1.1n  15 Mar 2022
built on: Sun Feb  5 21:23:17 2023 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-8bYUb4/openssl-1.1.1n=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc     373188.90k   388441.39k   390590.46k   392999.59k   392082.77k   392626.18k

Running openssl speed -elapsed -evp aes-128-cbc

Produced:

Bash:
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 224708342 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 98159314 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 25199593 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 6354318 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 794734 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 397634 aes-128-cbc's in 3.00s
OpenSSL 1.1.1n  15 Mar 2022
built on: Sun Feb  5 21:23:17 2023 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-8bYUb4/openssl-1.1.1n=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc    1198444.49k  2094065.37k  2150365.27k  2168940.54k  2170153.64k  2171611.82k

Any thoughts?
 
So, I just thought of something from another thread on here. On my router, running Untangle, I have the MAC Address of that VM mapped to 192.168.195.114. However, I noticed that Proxmox gives it an IP address of 172.18.0.1. I'm wondering if this could be causing an issue somewhere?

On the PVE, vmbr2, which this VM is now on (updated from the original post), the CIDR is 192.168.195.113/28, so it's within range. However, I noticed that some other VMs, like Home Assistant, are outside of the subnet range and load just fine, so what does that really do? Also, on the PVE under hosts, should I add a line to the domain mapped to an IP address?

I apologize for all the questions and for just replying to my own thread... just thinking out loud I suppose. I guess I took a few things for granted with Windows Hyper-V "just working" (I was running it on Windows 11 not Windows Server and thought Proxmox would be a better bet since I run Linux machines and wanted to use a light-weight OS/Hypervisor as compared to Windows 11 and Hyper-V).
 
Could you post the output of qm config <VM ID> please? I have a hunch that it might be related to your configuration.
 
So, I just thought of something from another thread on here. On my router, running Untangle, I have the MAC Address of that VM mapped to 192.168.195.114. However, I noticed that Proxmox gives it an IP address of 172.18.0.1. I'm wondering if this could be causing an issue somewhere?

On the PVE, vmbr2, which this VM is now on (updated from the original post), the CIDR is 192.168.195.113/28, so it's within range. However, I noticed that some other VMs, like Home Assistant, are outside of the subnet range and load just fine, so what does that really do? Also, on the PVE under hosts, should I add a line to the domain mapped to an IP address?

I apologize for all the questions and for just replying to my own thread... just thinking out loud I suppose. I guess I took a few things for granted with Windows Hyper-V "just working" (I was running it on Windows 11 not Windows Server and thought Proxmox would be a better bet since I run Linux machines and wanted to use a light-weight OS/Hypervisor as compared to Windows 11 and Hyper-V).

May very likely be a networking issue, as you're suspecting.

Please post the output of the following commands as well (and redact any information that shouldn't be public, of course):
  • cat /etc/network/interfaces
  • cat /proc/sys/net/ipv4/ip_forward
Also, it might be helpful to run traceroute <FQDN> via your internal and external hosts to see where things are really taking a long time. On Debian, you can install the traceroute package if you haven't already. Alternatively, if you have a Windows host, the tracert command should be equivalent (though I'm not a Windows expert).

The fact that the VM is experiencing such CPU spikes for a single request seems like quite the smoking gun, but since your CPU is set to host, AES / OpenSSL shouldn't be an issue, I assume. And even if they were, requests still shouldn't take that much time.

Just to double check: Are you also using HTTPS when accessing your Cloudron instance via the VM's internal IP (which is what I assume you meant above)? Or just when accessing it via its FQDN?
 
Please describe network configuration of your VMs and how they can be reached outside. Public IP? Router NAT? VPN to Cloudflare?
More information needed.
 
Could you post the output of qm config <VM ID> please? I have a hunch that it might be related to your configuration.

Sure, here you go. I've made some changes to the VM since my original post:

Code:
agent: 1
boot: order=scsi0;ide2;net0
cores: 2
cpu: host,flags=+aes
ide2: CerebroTower:iso/ubuntu-22.04.2-live-server-amd64.iso,media=cdrom,size=1929660K
memory: 12288
meta: creation-qemu=7.2.0,ctime=1682033371
name: cloudron
net0: virtio=00:00:8b:c3:95:3b,bridge=vmbr2,mtu=1
numa: 1
onboot: 1
ostype: l26
parent: PostInstall
scsi0: local-2tb:vm-106-disk-0,aio=io_uring,cache=writeback,discard=on,iothread=1,size=128G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=c14af8b8-da02-4cad-b903-f2ec2b11db5d
sockets: 2
vmgenid: afb90517-6e10-466a-adb8-df6fdd6cae96
 
May very likely be a networking issue, as you're suspecting.

Please post the output of the following commands as well (and redact any information that shouldn't be public, of course):
  • cat /etc/network/interfaces
  • cat /proc/sys/net/ipv4/ip_forward
Also, it might be helpful to run traceroute <FQDN> via your internal and external hosts to see where things are really taking a long time. On Debian, you can install the traceroute package if you haven't already. Alternatively, if you have a Windows host, the tracert command should be equivalent (though I'm not a Windows expert).

The fact that the VM is experiencing such CPU spikes for a single request seems like quite the smoking gun, but since your CPU is set to host, AES / OpenSSL shouldn't be an issue, I assume. And even if they were, requests still shouldn't take that much time.

Just to double check: Are you also using HTTPS when accessing your Cloudron instance via the VM's internal IP (which is what I assume you meant above)? Or just when accessing it via its FQDN?

cat /etc/network/interfaces produces:

Code:
auto lo
iface lo inet loopback

auto enp0s31f6
iface enp0s31f6 inet manual

auto enp1s0f0
iface enp1s0f0 inet manual
        mtu 9000

auto enp1s0f1
iface enp1s0f1 inet manual
        mtu 9000

auto enxfc34971ebfec
iface enxfc34971ebfec inet manual
        mtu 9000

auto vmbr0
iface vmbr0 inet static
        address 192.168.195.96/32
        gateway 192.168.195.1
        bridge-ports enp0s31f6
        bridge-stp off
        bridge-fd 0

auto vmbr1
iface vmbr1 inet static
        address 192.168.195.97/28
        bridge-ports enp1s0f0
        bridge-stp off
        bridge-fd 0
        mtu 9000

auto vmbr2
iface vmbr2 inet static
        address 192.168.195.113/28
        bridge-ports enp1s0f1
        bridge-stp off
        bridge-fd 0
        mtu 9000

auto vmbr3
iface vmbr3 inet static
        address 192.168.195.95/32
        bridge-ports enxfc34971ebfec
        bridge-stp off
        bridge-fd 0
        mtu 9000

cat /proc/sys/net/ipv4/ip_forward returns 0.

From the pve, if I run traceroute my.domain.com it shows the local IP Address, but from the initial step 1 and onwards it's all asteriks.

From my Windows machine, running tracert my.domain.com it goes:

Code:
  1    <1 ms    <1 ms    <1 ms  192.168.195.1
  2     3 ms     5 ms     3 ms  lo0-100.WASHDC-VFTTP-303.verizon-gni.net [71.126.144.1]
  3     4 ms     3 ms     4 ms  100.41.25.154
  4     *        *        *     Request timed out.


The VMs are given an internal IP and on my router I have a NAT Rule that when that VM's IP makes an external/WAN call, it gets assigned to one of my dedicated IPs. Then, on the port forwarding, when something from that IP comes in and is on port 80, 443, etc it gets forwarded to the internal IP.

Please let me know if you need any further details. I truly appreciate your help!
 
So I don't know what might cause this or why, but when the site isn't proxied through Cloudflare it loads up almost instantly. When it is proxied through, when I go to the domain I start randomly getting 522 errors. I wasn't having this issue before, so I'm not sure what's causing it now. Could it be something where I have to mark Cloudflare as a trusted proxy in Proxmox or something? It's really bizarre.
 
cat /proc/sys/net/ipv4/ip_forward returns 0.
There's the culprit! (Most likely, at least.)

That means that IPv4 forwarding isn't enabled. You can enable it with echo 1 > /proc/sys/net/ipv4/ip_forward - check with traceroute again after you did that.

This topic can also be found in our documentation. See the Routed Configuration section on how to ensure that IP-forwarding is persistent.

Let me know if the issue persists!
 
  • Like
Reactions: ajtatum
There's the culprit! (Most likely, at least.)

That means that IPv4 forwarding isn't enabled. You can enable it with echo 1 > /proc/sys/net/ipv4/ip_forward - check with traceroute again after you did that.

This topic can also be found in our documentation. See the Routed Configuration section on how to ensure that IP-forwarding is persistent.

Let me know if the issue persists!
I'm not shure in routed configuration at this setup.
There is an external router and host & VM has internal addresses.
Also I've noticed mtu 9000, it means jumbo frames and must enabled in whole network chain including network adapters, bridges, switches and routers. If not, maybe set it to standard 1500 (or comment it out) for testing?
 
Last edited:
I'm not shure in routed configuration at this setup.
There is an external router and host & VM has internal addresses.
Also I've noticed mtu 9000, it means jumbo frames and must enabled in whole network chain including network adapters, bridges, switches and routers. If not, maybe set it to standard 1500 (or comment it out) for testing?
Good point. That's also something that could be tested.
 
There's the culprit! (Most likely, at least.)

That means that IPv4 forwarding isn't enabled. You can enable it with echo 1 > /proc/sys/net/ipv4/ip_forward - check with traceroute again after you did that.

This topic can also be found in our documentation. See the Routed Configuration section on how to ensure that IP-forwarding is persistent.

Let me know if the issue persists!
How would you do do Routed Configuration with multiple nics? Or, how does the bridge know what it's parent adapter is? I tried implementing the above and with or without a bridge-port value of either the port or "none" made the web GUI unresponsive so I had to go back to the original implementation.

I did do echo 1 > /proc/sys/net/ipv4/ip_forward and also set the MTU, throughout the network, back down to 1500. I'll see how things go.

As an aside, I do think this is more and more of an issue with my router's OS, Untangle. I tried getting OPNSense running over the weekend and was barely able to do it but had to pull the plug because the house was getting mad without internet. However, in the brief period that I set it up, I learned about how Split DNS can be a better solution than 1:1 NAT, and I implemented some recommendations I found from OPNSense and pfSense forums, and the sites loaded much quicker. However, I had to go back and install Untangle and restore the config to make the house happy. I plan on purchasing another dual nic for the server and have OPNSense run off Proxmox as that makes a lot of sense, especially considering the ability to do snapshots and whatnot before making any changes.
 
How would you do do Routed Configuration with multiple nics? Or, how does the bridge know what it's parent adapter is? I tried implementing the above and with or without a bridge-port value of either the port or "none" made the web GUI unresponsive so I had to go back to the original implementation.

I did do echo 1 > /proc/sys/net/ipv4/ip_forward and also set the MTU, throughout the network, back down to 1500. I'll see how things go.

As an aside, I do think this is more and more of an issue with my router's OS, Untangle. I tried getting OPNSense running over the weekend and was barely able to do it but had to pull the plug because the house was getting mad without internet. However, in the brief period that I set it up, I learned about how Split DNS can be a better solution than 1:1 NAT, and I implemented some recommendations I found from OPNSense and pfSense forums, and the sites loaded much quicker. However, I had to go back and install Untangle and restore the config to make the house happy. I plan on purchasing another dual nic for the server and have OPNSense run off Proxmox as that makes a lot of sense, especially considering the ability to do snapshots and whatnot before making any changes.
I have pfSense running smoothly as VM on Proxmox. It has several network on several virtual NICs. VLANs on switches and on Proxmox
I can describe this solution more detailed, if you interested.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!