Mellanox ConnectX-6 Dx - full mesh - slow into VM

l.ansaloni · May 9, 2022

Hi there

I have 3 Proxmox nodes Supermicro SYS-120C-TN10R connected via Mellanox 100GbE ConnectX-6 Dx cards in cross-connect mode using MCP1600-C00AE30N DAC Cable Ethernet 100GbE QSFP28 0.5m

Bash:

# lspci -vv -s 98:00.0
98:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
        Subsystem: Super Micro Computer Inc MT2892 Family [ConnectX-6 Dx]
        Physical Slot: 0-2
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 18
        NUMA node: 1
        Region 0: Memory at 206ffc000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at dba00000 [disabled] [size=1M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn+
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [48] Vital Product Data
                Product Name: Supermicro Network Adapter
                Read-only fields:
                        [PN] Part number: AOC-A100G-m2CM
                        [V0] Vendor specific: 22.31.1014
                        [V1] Vendor specific: 1.00
                        [SN] Serial number: OA221S052953
                        [VA] Vendor specific: 2
                        [V2] Vendor specific: 3CECEF5C7DB2
                        [V3] Vendor specific: 3CECEF5C7DB3
                        [V4] Vendor specific:
                        [V5] Vendor specific:
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                End
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 2, stride: 1, Device ID: 101e
                Supported Page Size: 000007ff, System Page Size: 00000001
                Region 0: Memory at 0000206ffe800000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [1c0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [230 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [320 v1] Lane Margining at the Receiver <?>
        Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [420 v1] Data Link Feature <?>
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

I followed the guide: Full Mesh Network for Ceph Server
and in particular I used Open vSwitch to configure the network, this is the configuration of node 2 (IP node1: 10.15.15.1 - IP node3: 10.15.15.2):

Bash:

 cat /etc/network/interfaces

## ceph public network ##
##
auto enp152s0f0np0
iface enp152s0f0np0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr1
        ovs_mtu 9000
        ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=150 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true vlan_mode=native-untagged

auto enp152s0f1np1
iface enp152s0f1np1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr1
        ovs_mtu 9000
        ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=150 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true vlan_mode=native-untagged

auto vmbr1
iface vmbr1 inet static
        address 10.15.15.2/24
        ovs_type OVSBridge
        ovs_port enp152s0f0np0 enp152s0f1np1
        ovs_mtu 9000
        up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
        post-up sleep 10
##
## ceph public network ##

the speed of the card is correctly recognized at 100000Mb/s on the proxmox node:

Bash:

# ethtool enp152s0f0np0
Settings for enp152s0f0np0:
        Supported ports: [ Backplane ]
        Supported link modes:   1000baseT/Full
[omissis]
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: None       RS      BASER
        Link partner advertised link modes:  Not reported
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 100000Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x00000004 (4)
                               link
        Link detected: yes

If I run a speed test between two nodes I proxmox with iperf:

Server

Bash:

# iperf -s -p 9999
------------------------------------------------------------
Server listening on TCP port 9999
TCP window size:  128 KByte (default)
------------------------------------------------------------

Client

Bash:

# iperf -e -c 10.15.15.1 -P 4 -p 9999
[  4] local 10.15.15.2%vmbr1 port 57286 connected with 10.15.15.1 port 9999 (MSS=8948) (ct=0.07 ms)
------------------------------------------------------------
Client connecting to 10.15.15.1, TCP port 9999 with pid 1931055 (4 flows)
Write buffer size:  128 KByte
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  5] local 10.15.15.2%vmbr1 port 57288 connected with 10.15.15.1 port 9999 (MSS=8948) (ct=0.08 ms)
[  3] local 10.15.15.2%vmbr1 port 57284 connected with 10.15.15.1 port 9999 (MSS=8948) (ct=0.10 ms)
[  6] local 10.15.15.2%vmbr1 port 57290 connected with 10.15.15.1 port 9999 (MSS=8948) (ct=0.06 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  6] 0.0000-10.0001 sec  27.1 GBytes  23.3 Gbits/sec  222352/0          0     3163K/1077 us  2706020.04
[  5] 0.0000-10.0000 sec  27.1 GBytes  23.3 Gbits/sec  222267/0          0     3180K/1001 us  2910365.82
[  3] 0.0000-10.0001 sec  26.9 GBytes  23.1 Gbits/sec  220141/0          0     3224K/1248 us  2312000.94
[  4] 0.0000-10.0000 sec  26.9 GBytes  23.1 Gbits/sec  220147/0          0     3259K/1162 us  2483206.39
[ ID] Interval       Transfer     Bandwidth
[SUM] 0.0000-10.0000 sec   108 GBytes  92.8 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) = 0.058/0.078/0.102/0.050 ms (tot/err) = 4/0

The speed of 92.8 Gbits / sec is very good.

I created 3 VMs one for each proxmox node and connected the vmbr1 network card to access the ceph public network and connect the shared storage via cephFS:

Bash:

# cat /etc/pve/nodes/server01px/qemu-server/101.conf
agent: 1
boot: order=virtio0;ide2;net0
sockets: 2
cores: 10
memory: 81920
meta: creation-qemu=6.1.1,ctime=1648914236
name: docker101
net0: virtio=6E:CB:93:71:07:09,bridge=vmbr0,firewall=1
net1: virtio=DE:D9:D4:A8:4F:3A,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=e54cff81-eb71-4321-adfe-219de8e5f258
ide2: none,media=cdrom
virtio0: CEPH-NVME-3:vm-101-disk-0,cache=writeback,size=20G
virtio1: CEPH-NVME-3:vm-101-disk-1,cache=writeback,size=20G
virtio2: CEPH-NVME-3:vm-101-disk-2,cache=writeback,size=40G
vmgenid: b4b6bb91-931a-41cd-bbb8-57bd6cd92f07

If I run the same hyperf test between the VMs I get this result:

Server

Bash:

# iperf -s -p 9999
------------------------------------------------------------
Server listening on TCP port 9999
TCP window size:  128 KByte (default)
------------------------------------------------------------

Client

Bash:

# iperf -e -c 10.15.15.101 -P 5 -p 9999
------------------------------------------------------------
Client connecting to 10.15.15.101, TCP port 9999 with pid 3261831
Write buffer size:  128 KByte
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[  4] local 10.15.15.102 port 42560 connected with 10.15.15.101 port 9999 (ct=1.02 ms)
[  5] local 10.15.15.102 port 42558 connected with 10.15.15.101 port 9999 (ct=1.09 ms)
[  3] local 10.15.15.102 port 42556 connected with 10.15.15.101 port 9999 (ct=1.15 ms)
[  7] local 10.15.15.102 port 42564 connected with 10.15.15.101 port 9999 (ct=1.50 ms)
[  6] local 10.15.15.102 port 42562 connected with 10.15.15.101 port 9999 (ct=0.91 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  4] 0.0000-10.0008 sec  2.12 GBytes  1.82 Gbits/sec  17344/0          0       -1K/1160 us  195959.07
[  5] 0.0000-10.0053 sec  2.21 GBytes  1.90 Gbits/sec  18121/0          0       -1K/823 us  288445.02
[  7] 0.0000-10.0018 sec  2.21 GBytes  1.90 Gbits/sec  18078/0          0       -1K/382 us  620183.51
[  6] 0.0000-10.0048 sec  2.22 GBytes  1.90 Gbits/sec  18148/0          0       -1K/340 us  699280.42
[  3] 0.0000-10.0059 sec  2.15 GBytes  1.84 Gbits/sec  17572/0          0       -1K/452 us  509254.25
[SUM] 0.0000-10.0059 sec  10.9 GBytes  9.35 Gbits/sec  89263/0         0

Why is the total resulting speed only 9.35 Gbits/sec?

spirit · May 9, 2022

do you have enabled multiple queue on your vm nic options ?

by default, only 1vcpu is used to handle io virtulization, so it can be a bottleneck.

(but don't expect to reach 100gbit/s with 1 vm, it's still virtualized, so it's use more cpu than a physical matchine

l.ansaloni · May 10, 2022

Yes! Thanks a lot spirit.
Multiqueue in nic make the difference.

Without multiqueue:

Bash:

# iperf -e -c 10.15.15.102 -P 4 -p 9999
------------------------------------------------------------
Client connecting to 10.15.15.102, TCP port 9999 with pid 551007
Write buffer size:  128 KByte
TCP window size: 2.17 MByte (default)
------------------------------------------------------------
[  4] local 10.15.15.103 port 32770 connected with 10.15.15.102 port 9999 (ct=1.19 ms)
[  3] local 10.15.15.103 port 32768 connected with 10.15.15.102 port 9999 (ct=1.30 ms)
[  5] local 10.15.15.103 port 32772 connected with 10.15.15.102 port 9999 (ct=1.16 ms)
[  6] local 10.15.15.103 port 32774 connected with 10.15.15.102 port 9999 (ct=0.92 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  4] 0.0000-10.0013 sec  3.32 GBytes  2.85 Gbits/sec  27214/0          0       -1K/924 us  385987.41
[  5] 0.0000-10.0037 sec  3.34 GBytes  2.86 Gbits/sec  27323/0          0       -1K/318 us  1125771.10
[  6] 0.0000-10.0020 sec  3.27 GBytes  2.80 Gbits/sec  26755/0          0       -1K/211 us  1661666.40
[  3] 0.0000-10.0063 sec  3.39 GBytes  2.91 Gbits/sec  27741/0          0       -1K/515 us  705591.48
[SUM] 0.0000-10.0063 sec  13.3 GBytes  11.4 Gbits/sec  109033/0         0

With multiqueue=8 :

Bash:

# iperf -e -c 10.15.15.102 -P 4 -p 9999
------------------------------------------------------------
Client connecting to 10.15.15.102, TCP port 9999 with pid 551949
Write buffer size:  128 KByte
TCP window size: 1.91 MByte (default)
------------------------------------------------------------
[  4] local 10.15.15.103 port 35900 connected with 10.15.15.102 port 9999 (ct=1.67 ms)
[  6] local 10.15.15.103 port 35904 connected with 10.15.15.102 port 9999 (ct=1.42 ms)
[  3] local 10.15.15.103 port 35898 connected with 10.15.15.102 port 9999 (ct=1.74 ms)
[  5] local 10.15.15.103 port 35902 connected with 10.15.15.102 port 9999 (ct=1.50 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  4] 0.0000-10.0014 sec  7.23 GBytes  6.21 Gbits/sec  59264/0         15       -1K/995 us  780581.14
[  6] 0.0000-10.0032 sec  8.34 GBytes  7.16 Gbits/sec  68323/0         26       -1K/234 us  3825811.46
[  3] 0.0000-10.0038 sec  8.65 GBytes  7.42 Gbits/sec  70834/0         32       -1K/146 us  6356735.50
[  5] 0.0000-10.0006 sec  5.84 GBytes  5.02 Gbits/sec  47854/0         48       -1K/261 us  2403053.93
[SUM] 0.0000-10.0038 sec  30.1 GBytes  25.8 Gbits/sec  246275/0       121

I have other question:
1. from gui maximum value for multiqueue is 8, I have 20 vCPU on the VM can I raise the value to 20? Does it increase performance?
2. does the type of vCPU make a difference? Now I have set the default kvm64, can host type increase network performance?

spirit · May 10, 2022

l.ansaloni said:

Yes! Thanks a lot spirit.
Multiqueue in nic make the difference.

Without multiqueue:

Bash:

# iperf -e -c 10.15.15.102 -P 4 -p 9999
------------------------------------------------------------
Client connecting to 10.15.15.102, TCP port 9999 with pid 551007
Write buffer size:  128 KByte
TCP window size: 2.17 MByte (default)
------------------------------------------------------------
[  4] local 10.15.15.103 port 32770 connected with 10.15.15.102 port 9999 (ct=1.19 ms)
[  3] local 10.15.15.103 port 32768 connected with 10.15.15.102 port 9999 (ct=1.30 ms)
[  5] local 10.15.15.103 port 32772 connected with 10.15.15.102 port 9999 (ct=1.16 ms)
[  6] local 10.15.15.103 port 32774 connected with 10.15.15.102 port 9999 (ct=0.92 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  4] 0.0000-10.0013 sec  3.32 GBytes  2.85 Gbits/sec  27214/0          0       -1K/924 us  385987.41
[  5] 0.0000-10.0037 sec  3.34 GBytes  2.86 Gbits/sec  27323/0          0       -1K/318 us  1125771.10
[  6] 0.0000-10.0020 sec  3.27 GBytes  2.80 Gbits/sec  26755/0          0       -1K/211 us  1661666.40
[  3] 0.0000-10.0063 sec  3.39 GBytes  2.91 Gbits/sec  27741/0          0       -1K/515 us  705591.48
[SUM] 0.0000-10.0063 sec  13.3 GBytes  11.4 Gbits/sec  109033/0         0

With multiqueue=8 :

Bash:

# iperf -e -c 10.15.15.102 -P 4 -p 9999
------------------------------------------------------------
Client connecting to 10.15.15.102, TCP port 9999 with pid 551949
Write buffer size:  128 KByte
TCP window size: 1.91 MByte (default)
------------------------------------------------------------
[  4] local 10.15.15.103 port 35900 connected with 10.15.15.102 port 9999 (ct=1.67 ms)
[  6] local 10.15.15.103 port 35904 connected with 10.15.15.102 port 9999 (ct=1.42 ms)
[  3] local 10.15.15.103 port 35898 connected with 10.15.15.102 port 9999 (ct=1.74 ms)
[  5] local 10.15.15.103 port 35902 connected with 10.15.15.102 port 9999 (ct=1.50 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  4] 0.0000-10.0014 sec  7.23 GBytes  6.21 Gbits/sec  59264/0         15       -1K/995 us  780581.14
[  6] 0.0000-10.0032 sec  8.34 GBytes  7.16 Gbits/sec  68323/0         26       -1K/234 us  3825811.46
[  3] 0.0000-10.0038 sec  8.65 GBytes  7.42 Gbits/sec  70834/0         32       -1K/146 us  6356735.50
[  5] 0.0000-10.0006 sec  5.84 GBytes  5.02 Gbits/sec  47854/0         48       -1K/261 us  2403053.93
[SUM] 0.0000-10.0038 sec  30.1 GBytes  25.8 Gbits/sec  246275/0       121

I have other question:
1. from gui maximum value for multiqueue is 8, I have 20 vCPU on the VM can I raise the value to 20? Does it increase performance?

mmm, I don't remember if they are a limitation in qemu or not. Do you have tried to change it in vm config file directly ? (/etc/pve/qemu-server/<vmid>.conf) ?

l.ansaloni said:
2. does the type of vCPU make a difference? Now I have set the default kvm64, can host type increase network performance?

It's still better to use your real cpu model, if you application is compiled to use extra cpu flags. (like aes extension for ssl acceleration by example).
Now, I don't know if it could help iperf here.

l.ansaloni · May 10, 2022

Update:

I set the multiqueue to 16 from the configuraizone file but nothing has changed in terms of performance
I set the vCPU to Icelake-Server-noTSX but also in this case the network performances have not changed

In conclusion, I leave the multiqueue at 8 and I am satisfied with 30Gbps

VictorSTS · May 10, 2022

Maybe I'm wrong, but seems that your are not using MTU9000 in the VM's vNics.

Also, why not use CPU type "host"? Given that all your servers are the same, using "host" avoids you having to find out which instructions/features are used by QEMU any specific CPU type.

Another thing to check is testing with NUMA enabled both in the hardware and in the VM configuration. You really dont want to have the VM running in a NUMA node were your real nics are not connected to (data would have to travel through whichever bus interconnects your NUMA nodes).

l.ansaloni · May 10, 2022

VictorSTS said:
Maybe I'm wrong, but seems that your are not using MTU9000 in the VM's vNics.

Also, why not use CPU type "host"? Given that all your servers are the same, using "host" avoids you having to find out which instructions/features are used by QEMU any specific CPU type.

Another thing to check is testing with NUMA enabled both in the hardware and in the VM configuration. You really dont want to have the VM running in a NUMA node were your real nics are not connected to (data would have to travel through whichever bus interconnects your NUMA nodes).

I set the MTU 9000 value in the Open vSwitch configuration on the host nodes, how is it configured in the VM's vNics?

I have enabled NUMA in the VM configuration, how can I enable it in the hardware?

VictorSTS · May 10, 2022

Sorry, I was meant to say in the OS of the VM, not the VM configuration itself.

NUMA settings depend on the hardware and CPUs, check with your vendor which settings are needed and which is the NUMA arquitecture of your servers/CPUs and set VM's properties accordingly: if your hardware only has 1 NUMA node, your VMs should have 1 socket only, if your hardware has 2 NUMA nodes, your VMs may have up to 2 sockets, and so on.

l.ansaloni · May 11, 2022

VictorSTS said:
Sorry, I was meant to say in the OS of the VM, not the VM configuration itself.

NUMA settings depend on the hardware and CPUs, check with your vendor which settings are needed and which is the NUMA arquitecture of your servers/CPUs and set VM's properties accordingly: if your hardware only has 1 NUMA node, your VMs should have 1 socket only, if your hardware has 2 NUMA nodes, your VMs may have up to 2 sockets, and so on.

I try to set mtu9000 into the VM (Ubuntu 10.04):

Bash:

    ens19:
      mtu: 9000
      addresses:
      - 10.15.15.24/24

restart the VM and test but nothing change.

Hardware NUMA settings is enabled:

Bash:

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 0 size: 128531 MB
node 0 free: 849 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 1 size: 129011 MB
node 1 free: 330 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

mvs · May 11, 2022

spirit said:
(but don't expect to reach 100gbit/s with 1 vm, it's still virtualized, so it's use more cpu than a physical matchine

ConnectX6 has some features to get near link speed connection in a vm (HW offload for Open vSwitch, SR-IOV PF/VF, ...)
But it is not so easy to use as virtio-net.

Search

Search

Mellanox ConnectX-6 Dx - full mesh - slow into VM

l.ansaloni

Renowned Member

spirit

Distinguished Member

l.ansaloni

Renowned Member

spirit

Distinguished Member

l.ansaloni

Renowned Member

VictorSTS

Famous Member

l.ansaloni

Renowned Member

VictorSTS

Famous Member

l.ansaloni

Renowned Member

mvs

Member