Proxmox Cluster Storage Latency

zhoid

Member
Sep 4, 2019
24
0
21
41
Hi,

I have a Proxmox cluster with 15 hosts running the latest version of the PVE, with a SAN as our central storage device connected via 10Gpbs networking.

On my SAN I have been noticing an intermittent issue with write latency spikes, sometimes constant and other times it's just spikes but it does cause issues on VM workloads.

I have spent a substantial amount of time troubleshooting with our SAN vendor support and they advised the issue is being create from the Proxmox host/s

I need to monitor or measure VM workload operations and throughput from the hosts to the SAN, please advise how this could be done from the host CLI or which software could be best to measure this, eg would Zabbix provide enough information to troubleshoot the issue ?

I am measuring the network ports of the hosts to the storage switch and I don't see anything unusual.

Thanks

Zaid
 
It sounds like you are using iSCSI. Measuring TCP latency at ports is not useful, because it is unlikely TCP/IP is at fault. Otherwise you would have retransmits and other network errors.
If we assume that your SAN vendor analyses is correct and not just "go away write up" - they most likely saw that storage responses to requests are fine and its the hosts that are making pauses between requests.
We operate in low latency high bandwidth environments, and 99% of the time, hosts that connect to our storage are not optimized initially. They work fine when the storage is slow, but when the storage serves the data faster than clients can handle - thats when you start seeing issues.
Things like ethernet coalescing, NUMA alignment for NICs, CPU isolation - all play a large role. We have integrate many of these tunings into our Proxmox storage plugin.

As for IO monitoring - PVE runs on Linux, so any basic Linux tools will be helpful https://www.baeldung.com/linux/monitor-disk-io
I can also suggest Netdata - very nice visualization.

P.S. update for anyone who comes across this thread in search:
https://kb.blockbridge.com/technote/proxmox-tuning-low-latency-storage


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
It sounds like you are using iSCSI. Measuring TCP latency at ports is not useful, because it is unlikely TCP/IP is at fault. Otherwise you would have retransmits and other network errors.
If we assume that your SAN vendor analyses is correct and not just "go away write up" - they most likely saw that storage responses to requests are fine and its the hosts that are making pauses between requests.
We operate in low latency high bandwidth environments, and 99% of the time, hosts that connect to our storage are not optimized initially. They work fine when the storage is slow, but when the storage serves the data faster than clients can handle - thats when you start seeing issues.
Things like ethernet coalescing, NUMA alignment for NICs, CPU isolation - all play a large role. We have integrate many of these tunings into our Proxmox storage plugin.

As for IO monitoring - PVE runs on Linux, so any basic Linux tools will be helpful https://www.baeldung.com/linux/monitor-disk-io
I can also suggest Netdata - very nice visualization.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thanks very much for your prompt input.

Sorry forgot to mention that we make use of NFS instead of iSCSI, the main reason is to benefit from Snapshot capability within our Proxmox environment.

For now I used the "iftop" application on each host in my cluster to identify any high/unusual traffic/bandwidth on my storage network and picked up at least 3 hosts that had VM workloads that contributed to my problem.

I mitigate the issue by further limiting these VM workloads IOPS and Disk Bandwidth.

Thanks again for input - much appreciated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!