Tuning

rogueangel2k

New Member
Dec 18, 2023
9
1
3
Good afternoon. I have a production HA setup with shared iscsi. I fear I've misconfigured things and am stuck with the poor performance I'm experiencing.

I have 5 HA hosts. 4 active with guests and #5 for voting only.
- Each is dual port 802.3ad LACP bound 10 GIG MTU 9000
- HA configured iSCSI with then 2x LVM's over the top. The spinning and SSD's below on the qNAP.
- Each is dual port gigabit to 2x 5120 HP's for redundant host communication traffic. Each host's gigabit ports are set for "balance-alb".
- 256 Gigs of RAM each with a mix of EPYC procs on the 4 active hosts and a Xeon on the voting member with 16 Gigs of RAM on it.
- Because of the mix of EPYCS ranging from 7542, 7351, 7302, and a 7401 each guest is set to x86-64-v2-AES for the most compatible setup for migration. I tried others and when I would migrate a guest, the guest would reset, so I had to settle on that proc config for the guests.

qNAP TS-h1886XU-RP-R2-D1622-32G iSCSI
- Spinning LUN - SATA
- SSD LUN - SATA
- Dual port 802.3ad LACP bound 10 GIG MTU 9000
- 128 Gigs of RAM with a 4 core, 8 thread Xeon

Netgear XS724EM 24-Port 10 Gig Switch
- Each machine's port is configured for 802.3ad LACP support which the switch's documentation states it is capable of doing.
- Storage traffic and migration traffic only.
- Max MTU 9216

2x HP 5120 48 port gigabit switches for guests and redundant host communication.
- Guest traffic
- Host HA communication

I get poor performance even on the SSD side which only has 6 guests at the moment (4 domain controllers and 2 database apps). Even poorer performance on the spinners. Either I've configured something wrong or the tuning is off OR I've hit my limit and won't get anything more out of it. Throwing cores at the guests doesn't help much. It's the I/O that's eating me up, I think.

I guess my question is can I get more out of this configuration, or do I need to do something different, which just isn't in the budget right now? This is what I had to work with so I made lemonade to get HA and some redundancy.

I've read that multipath is ideal but really only talks about it being a redundant option and not a performant one. I know LACP isn't true 20 gig but I thought I'd be seeing more out of this setup... again... unless I missed something.

Please be kind. I need advice. Thank you.
 
Last edited:
Hi @rogueangel2k ,

It is really hard to assess the situation without concrete numbers, "poor performance" is not a sufficient descriptor.

if you suspect that storage IO is your bottleneck, you need to measure it independently of any VM traffic. Ideally, you'd do it on an idle system. Perhaps you can migrate VMs off one of your hosts for a test. Your first goal is measure and understand performance between your hypervisor and the storage, before inserting the virtualization latency.

One tool you can use is "fio". You can find some examples here: https://kb.blockbridge.com/technote/proxmox-tuning-low-latency-storage/index.html#description.

I would not spend much time on spinning rust testing. You should concentrate on your SDD, although their performance also depends on whether they are Enterprise or Consumer type.

LACP will not give you 20Gig on a single host. The default balancing is MAC XOR, and you only have one IP/MAC on each side.

If you can get some data, perhaps there is more to do.

Do take a look at the tips in the above article.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi @rogueangel2k ,

It is really hard to assess the situation without concrete numbers, "poor performance" is not a sufficient descriptor.

if you suspect that storage IO is your bottleneck, you need to measure it independently of any VM traffic. Ideally, you'd do it on an idle system. Perhaps you can migrate VMs off one of your hosts for a test. Your first goal is measure and understand performance between your hypervisor and the storage, before inserting the virtualization latency.

One tool you can use is "fio". You can find some examples here: https://kb.blockbridge.com/technote/proxmox-tuning-low-latency-storage/index.html#description.

I would not spend much time on spinning rust testing. You should concentrate on your SDD, although their performance also depends on whether they are Enterprise or Consumer type.

LACP will not give you 20Gig on a single host. The default balancing is MAC XOR, and you only have one IP/MAC on each side.

If you can get some data, perhaps there is more to do.

Do take a look at the tips in the above article.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I appreciate that. I did get iperf3 installed on the qNAP and hosts. Each host... host to host shows 10 gig... well like 9.6. Host to qNAP is also in the same range. I've never used fio so I'll attempt something as soon as I can. Thank you, bbgeek17.
 
Iperf is only testing the throughput of the network. If you are having performance issues below your network capabilities your disks are the cause. You have 6 hosts with enterprise loads on an SSD - so what is the performance characteristics of said SSD, does it match. You are using a QNAP (soho grade) NAS, what exactly are its specs, does it even have a CPU that can move data to iSCSI at 10Gbps, typically Drobo/QNAP etc cheaps out on an embedded grade 2 or 4 core CPU and will sell that same model for a good decade. How many disks are you spreading the load on etc, 6 VMs on a single (or mirror) SSD is pretty much the maximum for your average datacenter grade QLC.
 
  • Like
Reactions: Johannes S
SSD's are 6x WD Red SA500 2.5 4TB. qNAP, in their documentation, speak of creating a LUN for each core on the qNAP's proc. It is a 4 core, 8 thread proc. I only have 1 LUN for the SSD's. Given how my cluster is setup right now, I do not know if I "could" create more LUN's and wouldn't know how to integrate them into the cluster with it running and not bring everything offline in the process.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!