Hello,
I recently has a similar issue than https://forum.proxmox.com/threads/proxmox-5-4-issues-with-vm-performance.53769/ but I thought it was relevant to post a new thread not to mix the details. Please move to the same thread if you feel so.
I am running a pool of proxmox physical hosts, each hosting on kvm web servers Laravel with nginx/PHP7.1 for a high traffic web application.
On April 19th, we upgraded 2 hosts from Proxmox 5.0 to 5.4, to solve network issues (sudden lost of VM without ability to ping it, and need to restart it).
API1 host was upgraded during day time
API3 host was upgraded the same day in the evening
You can See Throughput in Graph 1.
During daily traffic, we observed that the 2 newly upgraded hosts have been providing lower throughput than usual, and lower throughput than the other host which was not upgraded.
VMstat output showed:
- 30% user CPU and 30% less traffic served on 2 upgraded hosts.
- 20% user CPU and normal traffic served on 1 host.
Perf output showed the same system calls and cpu usage (although 2x more events on the upgraded hosts)
All VMs are running with TSC clocksource.
Initially we thought about newly introduced kernel security patches (spectre/meltdown etc), and we disabled them on one of the host. Still performance was less than initially.
Still investigating, and to eliminate a possible root cause (kernel), we rebooted one host on April 30th, with a previous kernel (without downgrading proxmox). We did not change any kernel security features on this host.
By rolling back the kernel version from 4.15.18-13-pve to 4.10.15-1-pve, we are observing again correct performance.
You can see Throughput in Graph 2.
No change was performed on the host and on the VMs, excepted the Proxmox upgrade, via apt-get dist-upgrade.
We are planning to test other Proxmox kernels between 4.10 and 4.15 to pinpoint where the regression change was introduced. It is unknown yet if the performance hit introduced comes from Proxmox patch or the Linux kernel.
As specific linux kernel versions are tagged as dependencies for specific Proxmox releases, we also don't know if there is an impact running an older kernel. We have not encountered instabilities so far.
We still have 1 host running Proxmox 5.4 and latest kernel, we are available to provide any perf/vmstat or similar outputs before downgrading the kernel or changing settings.
Joffrey
I recently has a similar issue than https://forum.proxmox.com/threads/proxmox-5-4-issues-with-vm-performance.53769/ but I thought it was relevant to post a new thread not to mix the details. Please move to the same thread if you feel so.
I am running a pool of proxmox physical hosts, each hosting on kvm web servers Laravel with nginx/PHP7.1 for a high traffic web application.
On April 19th, we upgraded 2 hosts from Proxmox 5.0 to 5.4, to solve network issues (sudden lost of VM without ability to ping it, and need to restart it).
API1 host was upgraded during day time
API3 host was upgraded the same day in the evening
You can See Throughput in Graph 1.
During daily traffic, we observed that the 2 newly upgraded hosts have been providing lower throughput than usual, and lower throughput than the other host which was not upgraded.
VMstat output showed:
- 30% user CPU and 30% less traffic served on 2 upgraded hosts.
- 20% user CPU and normal traffic served on 1 host.
Perf output showed the same system calls and cpu usage (although 2x more events on the upgraded hosts)
All VMs are running with TSC clocksource.
Initially we thought about newly introduced kernel security patches (spectre/meltdown etc), and we disabled them on one of the host. Still performance was less than initially.
Still investigating, and to eliminate a possible root cause (kernel), we rebooted one host on April 30th, with a previous kernel (without downgrading proxmox). We did not change any kernel security features on this host.
By rolling back the kernel version from 4.15.18-13-pve to 4.10.15-1-pve, we are observing again correct performance.
You can see Throughput in Graph 2.
No change was performed on the host and on the VMs, excepted the Proxmox upgrade, via apt-get dist-upgrade.
We are planning to test other Proxmox kernels between 4.10 and 4.15 to pinpoint where the regression change was introduced. It is unknown yet if the performance hit introduced comes from Proxmox patch or the Linux kernel.
As specific linux kernel versions are tagged as dependencies for specific Proxmox releases, we also don't know if there is an impact running an older kernel. We have not encountered instabilities so far.
We still have 1 host running Proxmox 5.4 and latest kernel, we are available to provide any perf/vmstat or similar outputs before downgrading the kernel or changing settings.
Joffrey