Many ksoftirqd daemons using 99% of I/O

kristian.kirilov

Well-Known Member
Nov 17, 2016
64
2
48
40
Hello,

after the latest kernel upgrade (i think) there is a lot of ksoftirqd/0,2,3,4 daemons which consuming 99% of disk I/O? I've got this issue with two different servers, different hardware. Does anybody knows more information about that?
 

Attachments

  • ffffffff.jpg
    ffffffff.jpg
    109.2 KB · Views: 28
Hi,

same problem here.

root@proxmox03:~# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
openvswitch-switch: 2.6.2~pre+git20161223-3
 
Hi,

which Component produce the Interrupts?

Code:
cat /proc/interrupts
 
I don't know exactly but this is the output of interrupts, see these biggest counts
 

Attachments

  • zzzzzzz.jpg
    zzzzzzz.jpg
    360.6 KB · Views: 32
You have very much Interrupts on your ahci.
What storage technology you are using and is your storage under heavy load?
 
On this machine which i show the picture, i'm using local lvm storage, with software raid. Four disks with RAID6, then LVM as LVM thin-pool.

But i have this issue with different server, which use different technology: HBA's, shared volume presented by Storage, gfs2, clvmd, pacemaker/corosync.

I can show the interrupts from other servers, if you want.
 
yes may this brings more information.
 
Here is it.

On these logs, because of much powered server i'm getting softirq only for cpu0, 1, not at all cpu's. But anytime it shows 99%. This actually is not disk io, but interrupts, i don't know why...
 

Attachments

  • sfsfdsdf.txt
    sfsfdsdf.txt
    28.2 KB · Views: 8
  • gdfdfgdg.jpg
    gdfdfgdg.jpg
    81.4 KB · Views: 15
[48719.602582] perf: interrupt took too long (2515 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[70360.987751] perf: interrupt took too long (3148 > 3143), lowering kernel.perf_event_max_sample_rate to 63500
[105536.021584] perf: interrupt took too long (4027 > 3935), lowering kernel.perf_event_max_sample_rate to 49500
[208735.997048] perf: interrupt took too long (5036 > 5033), lowering kernel.perf_event_max_sample_rate to 39500

What about that? Isn't it related for interrupts? Sorry i don't have deep kernel understanding skills, i'm just guessing. I try to be helpful.