My nodes CPU is
CPU(s) 96 x AMD EPYC 7642 48-Core Processor (1 Socket)
The load average has been at 150+ now for several hours but the CPU usage doesn't go above 70%.
ps faxl | grep " D "
Is returned could this be the issue?
After reading it could be a network issue too with too many active connections I installed https://github.com/netdata/netdata to monitor the node but I can't see anything obvious. There is 2,000 active ipv4.sockstat_sockets could that be a limiting factor? I have tried adding some network optimisations
What else should I check?
CPU(s) 96 x AMD EPYC 7642 48-Core Processor (1 Socket)
The load average has been at 150+ now for several hours but the CPU usage doesn't go above 70%.
ps faxl | grep " D "
Code:
1 0 3789 2 20 0 0 0 taskq_ D ? 1164:49 \_ [txg_sync]
0 0 47979 3975 20 0 6140 896 pipe_w S+ pts/0 0:00 | \_ grep D
1 0 38487 1 20 0 6708 2168 run_st D ? 15:57 /bin/bash /usr/sbin/ksmtuned
Is returned could this be the issue?
After reading it could be a network issue too with too many active connections I installed https://github.com/netdata/netdata to monitor the node but I can't see anything obvious. There is 2,000 active ipv4.sockstat_sockets could that be a limiting factor? I have tried adding some network optimisations
Code:
net.core.netdev_max_backlog=8192
net.core.optmem_max=8192
net.core.rmem_max=16777216
net.core.somaxconn=8151
net.core.wmem_max=16777216
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.log_martians = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.ip_local_port_range=1024 65535
net.ipv4.tcp_base_mss = 1024
net.ipv4.tcp_challenge_ack_limit = 999999999
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_keepalive_intvl=30
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_time=240
net.ipv4.tcp_limit_output_bytes=65536
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_rfc1337=1
net.ipv4.tcp_rmem=8192 87380 16777216
net.ipv4.tcp_sack=1
net.ipv4.tcp_slow_start_after_idle=0
net.ipv4.tcp_syn_retries=3
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_wmem=8192 65536 16777216
net.netfilter.nf_conntrack_generic_timeout = 60
net.netfilter.nf_conntrack_helper=0
net.netfilter.nf_conntrack_max = 524288
net.netfilter.nf_conntrack_tcp_timeout_established = 28800
net.unix.max_dgram_qlen = 4096
What else should I check?