Hello proxmox-community,
we encounter a high error rate (errors, drops, overruns and frames) on the cephs network interfaces on our newly set up four-machine proxmox 8/ceph cluster when writing data (e.g. from /dev/urandom to a file on a virtual machine for testing).
All four machines have datacenter-nvme disks and three physical network cards (2x40Gpbs / Mellanox connectx-3 cx354a and 1x10Gbps which is not used for ceph). The errors appear as output errors on our switches (Juniper QFX5100-24Q) and as input errors (rx) on the hypervisors.
Network configuration:
Corresponding IP addresses on the other three hypervisors are the same scheme for all networks; hv01 has .11, hv02 has .12, and so on.
The juniper switch configuration for the ports:
What we've tried so far:
- setting mtu to 9000 on the relevant interfaces (see network config above, errors also occurred with default 1500)
- setting mtu on switches to 9216 (no difference to the current setting 9028)
- setting the following sysctl settings on all hypervisors:
- setting rx buffer size on the ethernet-interface on the hypervisors (default is 1024)
Is there anything we're missing or just don't see which causes these high error rates?
we encounter a high error rate (errors, drops, overruns and frames) on the cephs network interfaces on our newly set up four-machine proxmox 8/ceph cluster when writing data (e.g. from /dev/urandom to a file on a virtual machine for testing).
Code:
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
inet6 fe80::4cfc:1cff:fe02:7835 prefixlen 64 scopeid 0x20<link>
ether 4e:dc:2c:a2:79:34 txqueuelen 1000 (Ethernet)
RX packets 72321527 bytes 477341582531 (444.5 GiB)
RX errors 79792 dropped 69666 overruns 69666 frame 79624
TX packets 31629557 bytes 76964599295 (71.6 GiB)
TX errors 0 dropped 574 overruns 0 carrier 0 collisions 0
bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
ether be:d3:83:cf:f0:3d txqueuelen 1000 (Ethernet)
RX packets 126046100 bytes 891091401606 (829.8 GiB)
RX errors 53422 dropped 101505 overruns 96059 frame 52228
TX packets 124032313 bytes 946978782880 (881.9 GiB)
TX errors 0 dropped 384 overruns 0 carrier 0 collisions 0
All four machines have datacenter-nvme disks and three physical network cards (2x40Gpbs / Mellanox connectx-3 cx354a and 1x10Gbps which is not used for ceph). The errors appear as output errors on our switches (Juniper QFX5100-24Q) and as input errors (rx) on the hypervisors.
Network configuration:
Code:
iface ens1 inet manual
# Left port 1st mellanox
iface ens3d1 inet manual
# Left port 2nd mellanox
auto bond0
iface bond0 inet manual
bond-slaves ens1 ens3d1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9000
auto bond0.10
iface bond0.10 inet static
address 172.16.10.11/24
mtu 9000
# CEPH Public
auto bond0.40
iface bond0.40 inet static
address 172.16.40.11/24
mtu 9000
# Corosync 1
iface ens3 inet manual
# Right port 2nd mellanox
iface ens1d1 inet manual
# Right port 1st mellanox
auto bond1
iface bond1 inet manual
bond-slaves ens3 ens1d1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9000
iface bond1.20 inet manual
mtu 9000
# CEPH Cluster
auto vmbr20
iface vmbr20 inet static
address 172.16.20.11/24
bridge-ports bond1.20
bridge-stp off
bridge-fd 0
mtu 9000
# CephFS
auto vmbr100
iface vmbr100 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 100-4000
# VM Public
Corresponding IP addresses on the other three hypervisors are the same scheme for all networks; hv01 has .11, hv02 has .12, and so on.
The juniper switch configuration for the ports:
Code:
#show configuration interfaces ae11
description hv01-ceph-public;
mtu 9028;
aggregated-ether-options {
minimum-links 1;
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan {
members [ 20 100 ];
}
}
}
#show configuration interfaces ae10
description hv01-ceph-cluster;
mtu 9028;
aggregated-ether-options {
minimum-links 1;
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan {
members [ 10 40 ];
}
}
}
What we've tried so far:
- setting mtu to 9000 on the relevant interfaces (see network config above, errors also occurred with default 1500)
- setting mtu on switches to 9216 (no difference to the current setting 9028)
- setting the following sysctl settings on all hypervisors:
Code:
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.somaxconn = 32765
net.core.netdev_max_backlog = 32765
vm.swappiness = 1
- setting rx buffer size on the ethernet-interface on the hypervisors (default is 1024)
Code:
ethtool -G ens1 rx 512 # lead to non-working network
ethtool -G ens1 rx 2048 # slower ceph throughput
ethtool -G ens1 rx 4096 # even slower
ethtool -G ens1 rx 8192 # hardware max / non-working network
Is there anything we're missing or just don't see which causes these high error rates?