Low throughput on Ceph interface

chrispage1

Member
Sep 1, 2021
86
39
23
32
Hi,

I've got a three node proxmox cluster with a dedicated Ceph OSD network on a 10GBit link. It's been working great on our providers switching equipment but they've recently had some rack changes and plugged into new switches.

During this change, I took the opportunity to update the Proxmox nodes to the latest packages. Since this, our throughput on the 10G link won't exceed 400/500 Mbit/s.

The interfaces are still reporting as running at 10G but I can't push the throughput anywhere near this. My testing has been done using iperf3.

My question is, can anyone think of a reason that software may be causing this or would you deem this to 100% be a hardware problem?

Thanks,
Chris.
 
Last edited:
Hi,

I've got a three node proxmox cluster with a dedicated Ceph OSD network on a 10GBit link. It's been working great on our providers switching equipment but they've recently had some rack changes and plugged into new switches.

During this change, I took the opportunity to update the Proxmox nodes to the latest packages. Since this, our throughput on the 10G link won't exceed 400/500 Mbit/s.

The interfaces are still reporting as running at 10G but I can't push the throughput anywhere near this. My testing has been done using iperf3.

My question is, can anyone think of a reason that software may be causing this or would you deem this to 100% be a hardware problem?

Thanks,
Chris.
Please post:
  • cat /etc/kernel/cmdline
  • /etc/pve/ceph.conf
Its likely that you need iommu=pt as a kernel-cmd line parameter.
 
Thanks for your reply.

We don't have a /etc/kernel/cmdline file on the nodes. What should be in this file?

Ceph config is as below:

Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.10.50.111/24
     fsid = f47ac10b-58cc-4372-a567-0e02b2c3d479
     mon_allow_pool_delete = true
     mon_host = 10.0.0.111 10.0.0.112 10.0.0.113
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.0.0.111/24

     log_to_file = false
     log_to_syslog = true

     mon_osd_nearfull_ratio = .67
     mon_osd_full_ratio = .80

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring
     rbd_cache = true
     rbd_cache_max_dirty = 134217728
     rbd_cache_max_dirty_age = 60
     rbd_cache_max_dirty_object = 2
     rbd_cache_size = 268435456
     rbd_cache_target_dirty = 167772160
     rbd_cache_writethrough_until_flush = false

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mon]
     mon_cluster_log_file_level = info


[osd]
     osd_memory_target = 6Gi
     osd_recovery_max_active_ssd = 8
     osd_recovery_op_priority = 2

[osd.9]
#     bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=128,min_write_buffer_number_to_merge=16,compaction_style=kCompactionStyleLevel,write_buffer_size=8388608,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0

[mds.pve01]
     host = pve01
     mds_standby_for_name = pve

[mds.pve02]
     host = pve02
     mds_standby_for_name = pve

[mds.pve03]
     host = pve03
     mds standby for name = pve

[mon.pve01]
     public_addr = 10.0.0.111

[mon.pve02]
     public_addr = 10.0.0.112

[mon.pve03]
     public_addr = 10.0.0.113

Its likely that you need iommu=pt as a kernel-cmd line parameter.

Is this likely to have changed in recent versions then?

I think it's worth adding this isn't throughput between VMs but throughput on our actual bond1 interface which is 2 x 10G links configured in LACP.

Many thanks,
Chris.
 
Last edited:
Thanks for your reply.

We don't have a /etc/kernel/cmdline file on the nodes. What should be in this file?
Oh okay, then youll need to check /etc/default/grub https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_grub Depending on if you have amd or intel, youll need: intel_iommu=on iommu=pt or amd_iommu=on iommu=pt. You can check after reboot if the parameter worked by checking cat /proc/cmdline (not sure if this also applies for your grub setup)

IOMMU Passthrough Mode​

If your hardware supports IOMMU passthrough mode, enabling this mode might increase performance. This is because VMs then bypass the (default) DMA translation normally performed by the hyper-visor and instead pass DMA requests directly to the hardware IOMMU. To enable these options, add:
iommu=pt
to the kernel commandline.

Is this likely to have changed in recent versions then?

Many thanks,
Chris.
I cant tell, but we are using this parameter for all of our deployments. Some cards only had 500mbit (Mellanox 100Gbit for example)
 
We are running Intel, currently configured with only max cstate's -

Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_idle.max_cstate=0"
GRUB_CMDLINE_LINUX=""

Thank you - I will give your suggestions a go now!

Many thanks,
Chris.
 
Such a shame I really had high hopes for that working! After applying the configuration and a reboot of all nodes:

Code:
chris@pve03:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.2.16-10-pve root=/dev/mapper/pve-root ro quiet intel_idle.max_cstate=0 intel_iommu=on iommu=pt

Yet I'm still really lacking on throughput...

Code:
chris@pve01:~$ iperf3 -c 10.10.50.113
Connecting to host 10.10.50.113, port 5201
[  5] local 10.10.50.111 port 55666 connected to 10.10.50.113 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  55.9 MBytes   469 Mbits/sec  415   1.02 MBytes
[  5]   1.00-2.00   sec  60.0 MBytes   503 Mbits/sec    0   1.26 MBytes
[  5]   2.00-3.00   sec  52.5 MBytes   440 Mbits/sec    1   1.06 MBytes
[  5]   3.00-4.00   sec  57.5 MBytes   482 Mbits/sec    0   1.28 MBytes
[  5]   4.00-5.00   sec  60.0 MBytes   503 Mbits/sec    0   1.48 MBytes
[  5]   5.00-6.00   sec  60.0 MBytes   503 Mbits/sec    0   1.65 MBytes
[  5]   6.00-7.00   sec  62.5 MBytes   524 Mbits/sec    0   1.82 MBytes
[  5]   7.00-8.00   sec  61.2 MBytes   514 Mbits/sec    0   1.96 MBytes
[  5]   8.00-9.00   sec  60.0 MBytes   503 Mbits/sec    0   2.10 MBytes
[  5]   9.00-10.00  sec  62.5 MBytes   524 Mbits/sec    0   2.23 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   592 MBytes   497 Mbits/sec  416             sender
[  5]   0.00-10.00  sec   589 MBytes   494 Mbits/sec                  receiver

iperf Done.
chris@pve01:~$ iperf3 -c 10.10.50.112
Connecting to host 10.10.50.112, port 5201
[  5] local 10.10.50.111 port 43916 connected to 10.10.50.112 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  54.6 MBytes   458 Mbits/sec  280   1.28 MBytes
[  5]   1.00-2.00   sec  50.0 MBytes   419 Mbits/sec   98   1.08 MBytes
[  5]   2.00-3.00   sec  58.8 MBytes   493 Mbits/sec    0   1.31 MBytes
[  5]   3.00-4.00   sec  57.5 MBytes   482 Mbits/sec    0   1.49 MBytes
[  5]   4.00-5.00   sec  60.0 MBytes   503 Mbits/sec    0   1.66 MBytes
[  5]   5.00-6.00   sec  55.0 MBytes   461 Mbits/sec   15    542 KBytes
[  5]   6.00-7.00   sec  48.8 MBytes   409 Mbits/sec    0    874 KBytes
[  5]   7.00-8.00   sec  58.8 MBytes   493 Mbits/sec    0   1.13 MBytes
[  5]   8.00-9.00   sec  58.8 MBytes   493 Mbits/sec    0   1.35 MBytes
[  5]   9.00-10.00  sec  58.8 MBytes   493 Mbits/sec    0   1.53 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   561 MBytes   470 Mbits/sec  393             sender
[  5]   0.00-10.00  sec   558 MBytes   468 Mbits/sec                  receiver

The strange thing is, on the same NIC (an Intel(R) Ethernet 10G 4P X710 SFP+ rNDC) we have our internal network. This is running at 10Gbit no problem...

Code:
chris@pve01:~$ iperf3 -c 10.0.0.112
Connecting to host 10.0.0.112, port 5201
[  5] local 10.0.0.111 port 43764 connected to 10.0.0.112 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.10 GBytes  9.41 Gbits/sec   30   1.25 MBytes
[  5]   1.00-2.00   sec  1.09 GBytes  9.41 Gbits/sec   27   1.25 MBytes
[  5]   2.00-3.00   sec  1.09 GBytes  9.41 Gbits/sec   20   1.25 MBytes
[  5]   3.00-4.00   sec  1.10 GBytes  9.41 Gbits/sec    3   1.33 MBytes
[  5]   4.00-5.00   sec  1.09 GBytes  9.41 Gbits/sec    4   1.33 MBytes
[  5]   5.00-6.00   sec  1.09 GBytes  9.41 Gbits/sec    2   1.33 MBytes
[  5]   6.00-7.00   sec  1.10 GBytes  9.42 Gbits/sec    3   1.25 MBytes
[  5]   7.00-8.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.25 MBytes
[  5]   8.00-9.00   sec  1.10 GBytes  9.42 Gbits/sec    1   1.31 MBytes
[  5]   9.00-10.00  sec  1.09 GBytes  9.41 Gbits/sec    0   1.31 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.0 GBytes  9.41 Gbits/sec   90             sender
[  5]   0.00-10.00  sec  11.0 GBytes  9.41 Gbits/sec                  receiver

iperf Done.
chris@pve01:~$ iperf3 -c 10.0.0.113
Connecting to host 10.0.0.113, port 5201
[  5] local 10.0.0.111 port 50272 connected to 10.0.0.113 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.10 GBytes  9.42 Gbits/sec   32   1.43 MBytes
[  5]   1.00-2.00   sec  1.10 GBytes  9.42 Gbits/sec   11   1.51 MBytes
[  5]   2.00-3.00   sec  1.09 GBytes  9.41 Gbits/sec    5   1.61 MBytes
[  5]   3.00-4.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.80 MBytes
[  5]   4.00-5.00   sec  1.10 GBytes  9.41 Gbits/sec    0   1.89 MBytes
[  5]   5.00-6.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.99 MBytes
[  5]   6.00-7.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.99 MBytes
[  5]   7.00-8.00   sec  1.09 GBytes  9.40 Gbits/sec   49   1020 KBytes
[  5]   8.00-9.00   sec  1.09 GBytes  9.35 Gbits/sec   21   1.14 MBytes
[  5]   9.00-10.00  sec  1.09 GBytes  9.40 Gbits/sec    0   1.74 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.40 Gbits/sec  118             sender
[  5]   0.00-10.00  sec  10.9 GBytes  9.40 Gbits/sec                  receiver

There's no difference between the two networks other than VLAN configuration and MTU's.

Code:
auto bond0
iface bond0 inet manual
    bond-slaves eno1 eno3
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
    mtu 1500
#Public, Private, Management

auto bond1
iface bond1 inet manual
    bond-slaves eno2 eno4
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
    mtu 9000
#Ceph Network

auto vmbr0
iface vmbr0 inet static
    address 10.0.0.111/24
    gateway 10.0.0.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    mtu 1500
#Public, Private, Management

auto vmbr1
iface vmbr1 inet static
    address 10.10.50.111/24
    bridge-ports bond1
    bridge-stp off
    bridge-fd 0
    mtu 9000
#Ceph Network

At the moment I'm at a bit of a loss as to what it could be so I appreciate your help on this!
 
Last edited:
Can u check the ports eno2 and eno4 with ethool? Can you dump the modules with ethtool - m eno2 and ethtool -m eno4 but also check the normal status with ethtool eno2 and ethtool eno4. You have a X710! This card is known for having problems with LLDP (when combined with LACP). Try:


Code:
for i in $(lshw -c network -businfo | grep X710 | awk '{print $2}')

do

    ethtool --set-priv-flags $i disable-fw-lldp on

done
 
Thanks @jsterr

eno2:

Code:
    Identifier                                : 0x03 (SFP)
    Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
    Connector                                 : 0x07 (LC)
    Transceiver codes                         : 0x10 0x00 0x00 0x00 0x40 0x00 0x0c 0x00 0x00
    Transceiver type                          : 10G Ethernet: 10G Base-SR
    Transceiver type                          : FC: short distance (S)
    Transceiver type                          : FC: Multimode, 62.5um (M6)
    Transceiver type                          : FC: Multimode, 50um (M5)
    Encoding                                  : 0x06 (64B/66B)
    BR, Nominal                               : 10300MBd
    Rate identifier                           : 0x00 (unspecified)
    Length (SMF,km)                           : 0km
    Length (SMF)                              : 0m
    Length (50um)                             : 300m
    Length (62.5um)                           : 150m
    Length (Copper)                           : 0m
    Length (OM3)                              : 0m
    Laser wavelength                          : 850nm
    Vendor name                               : FS
    Vendor OUI                                : 00:1b:21
    Vendor PN                                 : SFP-10GSR-85
    Vendor rev                                : A
    Option values                             : 0x00 0x1a
    Option                                    : RX_LOS implemented
    Option                                    : TX_FAULT implemented
    Option                                    : TX_DISABLE implemented
    BR margin, max                            : 0%
    BR margin, min                            : 0%
    Vendor SN                                 : F2030040626
    Date code                                 : 200729
    Optical diagnostics support               : Yes
    Laser bias current                        : 6.132 mA
    Laser output power                        : 0.6412 mW / -1.93 dBm
    Receiver signal average optical power     : 0.2967 mW / -5.28 dBm
    Module temperature                        : 46.95 degrees C / 116.52 degrees F
    Module voltage                            : 3.3408 V
    Alarm/warning flags implemented           : Yes
    Laser bias current high alarm             : Off
    Laser bias current low alarm              : Off
    Laser bias current high warning           : Off
    Laser bias current low warning            : Off
    Laser output power high alarm             : Off
    Laser output power low alarm              : Off
    Laser output power high warning           : Off
    Laser output power low warning            : Off
    Module temperature high alarm             : Off
    Module temperature low alarm              : Off
    Module temperature high warning           : Off
    Module temperature low warning            : Off
    Module voltage high alarm                 : Off
    Module voltage low alarm                  : Off
    Module voltage high warning               : Off
    Module voltage low warning                : Off
    Laser rx power high alarm                 : Off
    Laser rx power low alarm                  : Off
    Laser rx power high warning               : Off
    Laser rx power low warning                : Off
    Laser bias current high alarm threshold   : 15.000 mA
    Laser bias current low alarm threshold    : 2.000 mA
    Laser bias current high warning threshold : 12.000 mA
    Laser bias current low warning threshold  : 3.000 mA
    Laser output power high alarm threshold   : 1.5849 mW / 2.00 dBm
    Laser output power low alarm threshold    : 0.1122 mW / -9.50 dBm
    Laser output power high warning threshold : 1.0000 mW / 0.00 dBm
    Laser output power low warning threshold  : 0.1778 mW / -7.50 dBm
    Module temperature high alarm threshold   : 80.00 degrees C / 176.00 degrees F
    Module temperature low alarm threshold    : -10.00 degrees C / 14.00 degrees F
    Module temperature high warning threshold : 70.00 degrees C / 158.00 degrees F
    Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
    Module voltage high alarm threshold       : 3.6300 V
    Module voltage low alarm threshold        : 2.9700 V
    Module voltage high warning threshold     : 3.4650 V
    Module voltage low warning threshold      : 3.1350 V
    Laser rx power high alarm threshold       : 1.5849 mW / 2.00 dBm
    Laser rx power low alarm threshold        : 0.0389 mW / -14.10 dBm
    Laser rx power high warning threshold     : 1.0000 mW / 0.00 dBm
    Laser rx power low warning threshold      : 0.0617 mW / -12.10 dBm

eno4:

Code:
    Identifier                                : 0x03 (SFP)
    Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
    Connector                                 : 0x07 (LC)
    Transceiver codes                         : 0x10 0x00 0x00 0x00 0x40 0x00 0x0c 0x00 0x00
    Transceiver type                          : 10G Ethernet: 10G Base-SR
    Transceiver type                          : FC: short distance (S)
    Transceiver type                          : FC: Multimode, 62.5um (M6)
    Transceiver type                          : FC: Multimode, 50um (M5)
    Encoding                                  : 0x06 (64B/66B)
    BR, Nominal                               : 10300MBd
    Rate identifier                           : 0x00 (unspecified)
    Length (SMF,km)                           : 0km
    Length (SMF)                              : 0m
    Length (50um)                             : 300m
    Length (62.5um)                           : 150m
    Length (Copper)                           : 0m
    Length (OM3)                              : 0m
    Laser wavelength                          : 850nm
    Vendor name                               : FS
    Vendor OUI                                : 00:1b:21
    Vendor PN                                 : SFP-10GSR-85
    Vendor rev                                : A
    Option values                             : 0x00 0x1a
    Option                                    : RX_LOS implemented
    Option                                    : TX_FAULT implemented
    Option                                    : TX_DISABLE implemented
    BR margin, max                            : 0%
    BR margin, min                            : 0%
    Vendor SN                                 : F2030040604
    Date code                                 : 200729
    Optical diagnostics support               : Yes
    Laser bias current                        : 5.880 mA
    Laser output power                        : 0.6254 mW / -2.04 dBm
    Receiver signal average optical power     : 0.6348 mW / -1.97 dBm
    Module temperature                        : 45.72 degrees C / 114.30 degrees F
    Module voltage                            : 3.3544 V
    Alarm/warning flags implemented           : Yes
    Laser bias current high alarm             : Off
    Laser bias current low alarm              : Off
    Laser bias current high warning           : Off
    Laser bias current low warning            : Off
    Laser output power high alarm             : Off
    Laser output power low alarm              : Off
    Laser output power high warning           : Off
    Laser output power low warning            : Off
    Module temperature high alarm             : Off
    Module temperature low alarm              : Off
    Module temperature high warning           : Off
    Module temperature low warning            : Off
    Module voltage high alarm                 : Off
    Module voltage low alarm                  : Off
    Module voltage high warning               : Off
    Module voltage low warning                : Off
    Laser rx power high alarm                 : Off
    Laser rx power low alarm                  : Off
    Laser rx power high warning               : Off
    Laser rx power low warning                : Off
    Laser bias current high alarm threshold   : 15.000 mA
    Laser bias current low alarm threshold    : 2.000 mA
    Laser bias current high warning threshold : 12.000 mA
    Laser bias current low warning threshold  : 3.000 mA
    Laser output power high alarm threshold   : 1.5849 mW / 2.00 dBm
    Laser output power low alarm threshold    : 0.1122 mW / -9.50 dBm
    Laser output power high warning threshold : 1.0000 mW / 0.00 dBm
    Laser output power low warning threshold  : 0.1778 mW / -7.50 dBm
    Module temperature high alarm threshold   : 80.00 degrees C / 176.00 degrees F
    Module temperature low alarm threshold    : -10.00 degrees C / 14.00 degrees F
    Module temperature high warning threshold : 70.00 degrees C / 158.00 degrees F
    Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
    Module voltage high alarm threshold       : 3.6300 V
    Module voltage low alarm threshold        : 2.9700 V
    Module voltage high warning threshold     : 3.4650 V
    Module voltage low warning threshold      : 3.1350 V
    Laser rx power high alarm threshold       : 1.5849 mW / 2.00 dBm
    Laser rx power low alarm threshold        : 0.0389 mW / -14.10 dBm
    Laser rx power high warning threshold     : 1.0000 mW / 0.00 dBm
    Laser rx power low warning threshold      : 0.0617 mW / -12.10 dBm

eno2:

Code:
    Supported ports: [ FIBRE ]
    Supported link modes:   10000baseSR/Full
    Supported pause frame use: Symmetric Receive-only
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10000baseSR/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Speed: 10000Mb/s
    Duplex: Full
    Auto-negotiation: off
    Port: FIBRE
    PHYAD: 0
    Transceiver: internal
    Supports Wake-on: g
    Wake-on: g
        Current message level: 0x00000007 (7)
                               drv probe link
    Link detected: yes

eno4:

Code:
    Supported ports: [ FIBRE ]
    Supported link modes:   10000baseSR/Full
    Supported pause frame use: Symmetric Receive-only
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10000baseSR/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Speed: 10000Mb/s
    Duplex: Full
    Auto-negotiation: off
    Port: FIBRE
    PHYAD: 0
    Transceiver: internal
    Supports Wake-on: g
    Wake-on: g
        Current message level: 0x00000007 (7)
                               drv probe link

From what I can see everything looks OK.

You have a X710! This card is known for having problems with LLDP (when combined with LACP)

Oh, that's not ideal! However, eno1 & eno3 are working just fine on the X710 NIC.

I've run the suggested commands on pve03 & pve02 but no difference to the iperf tests from pve03 -> pve02 & reverse, I presume this change wouldn't be persisted on reboot so not a case of needing a reboot?

Perhaps this is something to consider disabling either way.

Thanks,
Chris.
 
Last edited:
Did you check if the command worked? For example (I dont have X710 here right now)

Code:
root@PMX4:~# ethtool --show-priv-flags enp6s0f1
Private flags for enp6s0f1:
legacy-rx: off

Regarding the lacp and lldp problem, you can also check https://www.thomas-krenn.com/en/wik...Series_LACP_Configuration#Disable_LLDP_Engine and stop the lldp engine via commandline.

Edit: please check (but yeah, you are right, the other ports are working well ...)

Network cards of the Intel Ethernet 700 series (X710, XL710, XXV710) process these LLDP frames in the standard configuration in the integrated LLDP engine. In the following instructions we will show you how to disable them under Linux or VMware.[1] Please note that at least firmware version NVM 6.01 and a current driver version are required.[2][3][4]

But yeah, its strange that the other ports are doing well. Im out of good ideas, but I also had problems with iperf3 a few times, what performance are you able to archieve in ceph when using iperf instead of iperf3?


Code:
## IPERF Server
iperf -s -P 64

## IPERF CLIENT
iperf -c 192.168.99.31 -P 64 -t 3600
 
Last edited:
Just checked the driver versions if of any help...

Code:
modinfo i40e | grep ver
filename:       /lib/modules/6.2.16-10-pve/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
description:    Intel(R) Ethernet Connection XL710 Network Driver
srcversion:     F4CBEC026738F03F2EDD1D1
vermagic:       6.2.16-10-pve SMP preempt mod_unload modversions

Verified disable-fw-lldp flags too:

Code:
ethtool --show-priv-flags eno3
Private flags for eno3:
MFP                   : off
total-port-shutdown   : off
LinkPolling           : off
flow-director-atr     : on
veb-stats             : off
hw-atr-eviction       : off
link-down-on-close    : off
legacy-rx             : off
disable-source-pruning: off
disable-fw-lldp       : on
rs-fec                : off
base-r-fec            : off
vf-vlan-pruning       : off

I'll still take a bit more of a look on the subject of disabling LLDP.

Just ran a throughput test with iPerf. I had to cancel a little early as it took out our very sensitive Redis cluster, but ultimately getting the same throughput values.

Code:
iperf -c 10.10.50.113 -P 64 -t 3600
------------------------------------------------------------
Client connecting to 10.10.50.113, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[ 10] local 10.10.50.112 port 34734 connected with 10.10.50.113 port 5001 (icwnd/mss/irtt=87/8948/69)
[ 20] local 10.10.50.112 port 34854 connected with 10.10.50.113 port 5001 (icwnd/mss/irtt=87/8948/55)
[ 38] local 10.10.50.112 port 34960 connected with 10.10.50.113 port 5001 (icwnd/mss/irtt=87/8948/72)
...
[ 21] 0.0000-365.7054 sec   311 MBytes  7.14 Mbits/sec
[ 34] 0.0000-365.7213 sec   317 MBytes  7.27 Mbits/sec
[  9] 0.0000-366.6893 sec   333 MBytes  7.62 Mbits/sec
[SUM] 0.0000-365.6178 sec  20.2 GBytes   474 Mbits/sec

But yeah, its strange that the other ports are doing well. Im out of good ideas, but I also had problems with iperf3 a few times, what performance are you able to archieve in ceph when using iperf instead of iperf3?

Well, thank you for your input and suggestions for diagnosing this issue. Having run through these steps and tests, I can't think this is anything other than an issue at a physical hardware level.

Thanks,
Chris.
 
  • Like
Reactions: jsterr
So after spending the day testing, we're still no further on. Relocated eno2 & eno4 on nodes 2 & 3 to an entirely different freshly configured switch with no success. Speeds are still seemingly throttled.

We've also tried changing the bond-lacp-rate to fast, and tested with layer 2 2/3 & 3/4 hash policies with no change.

The only other possibility I can think is that there is some kind of firmware bug as I ran an apt upgrade right before migrating each node (seemed sensible as there was no workload on the nodes)

Code:
modinfo i40e | grep ver
filename:       /lib/modules/6.2.16-10-pve/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
description:    Intel(R) Ethernet Connection XL710 Network Driver
srcversion:     F4CBEC026738F03F2EDD1D1
vermagic:       6.2.16-10-pve SMP preempt mod_unload modversions

Having said that, I'd also have expected to see eno1 & eno3 suffer the same problems...
 
Please post your /var/apt/history.log you should NEVER run apt upgrade on any proxmox system, this can mess up your system to the point where only reinstalling helps.
 
Last edited:
Please post your /etc/apt/history.log you should NEVER run apt upgrade on any proxmox system, this can mess up your system to the point where only reinstalling helps.

Oh really, I thought it'd just install/update packages that are provided by Proxmox repos?

Morning. That file doesn't seem to exist but I do have history files in /var/log/apt. Most recent history file below...

Code:
PVE01

Start-Date: 2023-09-06  15:21:25
Commandline: apt dist-upgrade
Requested-By: chris (1000)
Install: proxmox-kernel-6.2.16-10-pve:amd64 (6.2.16-10, automatic)
Upgrade: zabbix-sender:amd64 (1:7.0.0~alpha3-1+debian12, 1:7.0.0~alpha4-1+debian12), pve-qemu-kvm:amd64 (8.0.2-3, 8.0.2-5), libjs-extjs:amd64 (7.0.0-3, 7.0.0-4), proxmox-kernel-6.2:amd64 (6.2.16-8, 6.2.16-10), proxmox-backup-file-restore:amd64 (3.0.1-1, 3.0.2-1), ifupdown2:amd64 (3.2.0-1+pmx3, 3.2.0-1+pmx4), libpve-access-control:amd64 (8.0.3, 8.0.4), zabbix-agent2:amd64 (1:7.0.0~alpha3-1+debian12, 1:7.0.0~alpha4-1+debian12), proxmox-backup-client:amd64 (3.0.1-1, 3.0.2-1), libpve-common-perl:amd64 (8.0.6, 8.0.7), librados2-perl:amd64 (1.4.0, 1.4.1)
End-Date: 2023-09-06  15:22:13

Start-Date: 2023-09-06  15:23:25
Commandline: apt remove zabbix-agent2
Requested-By: chris (1000)
Remove: zabbix-agent2:amd64 (1:7.0.0~alpha4-1+debian12)
End-Date: 2023-09-06  15:23:25

Start-Date: 2023-09-06  15:23:37
Commandline: apt install zabbix-agent2
Requested-By: chris (1000)
Install: zabbix-agent2:amd64 (1:6.4.6-1+debian12)
End-Date: 2023-09-06  15:23:40

Start-Date: 2023-09-06  15:23:47
Commandline: apt autoremove
Requested-By: chris (1000)
Remove: telnet:amd64 (0.17+2.4-2), pve-kernel-5.15.30-2-pve:amd64 (5.15.30-3), g++-10:amd64 (10.2.1-6), libfmt7:amd64 (7.1.3+ds1-5), libpython3-dev:amd64 (3.11.2-1+b1), pve-kernel-5.15.35-1-pve:amd64 (5.15.35-3), libthrift-0.13.0:amd64 (0.13.0-6), python-pip-whl:amd64 (20.3.4-4+deb11u1), libtiff5:amd64 (4.2.0-1+deb11u4), zlib1g-dev:amd64 (1:1.2.13.dfsg-1), python3-wheel:amd64 (0.38.4-2), python-pastedeploy-tpl:amd64 (3.0.1-5), pve-kernel-5.13.19-3-pve:amd64 (5.13.19-7), python3-dev:amd64 (3.11.2-1+b1), libwebp6:amd64 (0.6.1-2.1+deb11u1), libpython3.11-dev:amd64 (3.11.2-6), python3.9-dev:amd64 (3.9.2-1), libpython3.9-dev:amd64 (3.9.2-1), python3.11-dev:amd64 (3.11.2-6), libjaeger:amd64 (16.2.9-pve1), pve-kernel-5.15.102-1-pve:amd64 (5.15.102-1), libexpat1-dev:amd64 (2.5.0-1), libstdc++-10-dev:amd64 (10.2.1-6), pve-kernel-5.15.39-4-pve:amd64 (5.15.39-4)
End-Date: 2023-09-06  15:24:21

PVE02

Requested-By: chris (1000)
Install: usb.ids:amd64 (2023.05.17-0+deb12u1, automatic), lshw:amd64 (02.19.git.2021.06.19.996aaad9c7-2+b1)
End-Date: 2023-09-12  09:55:35

Start-Date: 2023-09-12  14:14:39
Commandline: apt upgrade
Requested-By: chris (1000)
Install: proxmox-kernel-6.2.16-12-pve:amd64 (6.2.16-12, automatic)
Upgrade: pve-firmware:amd64 (3.7-1, 3.8-2), proxmox-kernel-6.2:amd64 (6.2.16-10, 6.2.16-12), linux-libc-dev:amd64 (6.1.38-4, 6.1.52-1)
End-Date: 2023-09-12  14:16:05

PVE03

Requested-By: chris (1000)
Install: proxmox-kernel-6.2.16-12-pve:amd64 (6.2.16-12, automatic)
Upgrade: pve-firmware:amd64 (3.7-1, 3.8-2), proxmox-kernel-6.2:amd64 (6.2.16-10, 6.2.16-12), linux-libc-dev:amd64 (6.1.38-4, 6.1.52-1)
End-Date: 2023-09-12  09:43:40

Start-Date: 2023-09-12  09:53:58
Commandline: apt install lshw
Requested-By: chris (1000)
Install: usb.ids:amd64 (2023.05.17-0+deb12u1, automatic), lshw:amd64 (02.19.git.2021.06.19.996aaad9c7-2+b1)
End-Date: 2023-09-12  09:53:59

So pve02 & pve03 are running 6.2.16-12-pve while pve01 is running 6.2.16-10-pve
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!