Network IRQ problem

Nemesiz

Renowned Member
Jan 16, 2009
729
56
93
Lithuania
Hi,

I have some problems with network IRQ at ddos attacks (~350000 TCP pps port flood from unique IP, ~150mbps). Network IRQ takes one core of CPU and it become 100% load.

# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 129 0 0 0 0 0 0 0 IR-IO-APIC-edge timer
1: 3 0 0 0 0 0 0 0 IR-IO-APIC-edge i8042
8: 1 0 0 0 0 0 0 0 IR-IO-APIC-edge rtc0
9: 0 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi acpi
12: 4 0 0 0 0 0 0 0 IR-IO-APIC-edge i8042
23: 2079 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb1, ehci_hcd:usb2
32: 383 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi nouveau
36: 163 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi snd_hda_intel
48: 0 0 0 0 0 0 0 0 DMAR_MSI-edge dmar0
49: 216 0 0 0 0 0 0 0 IR-HPET_MSI-edge hpet2
50: 0 0 0 0 0 0 0 0 IR-HPET_MSI-edge hpet3
51: 0 0 0 0 0 0 0 0 IR-HPET_MSI-edge hpet4
52: 0 0 0 0 0 0 0 0 IR-HPET_MSI-edge hpet5
53: 0 0 0 0 0 0 0 0 IR-HPET_MSI-edge hpet6
63: 1352305 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth1
64: 145205622 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0
65: 9376474 0 0 0 0 0 0 0 IR-PCI-MSI-edge ahci
66: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge ahci
67: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
68: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
69: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
70: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
71: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
72: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
73: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
74: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
75: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
76: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
77: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
78: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
79: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
80: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
81: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
82: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd
83: 305 0 0 0 0 0 0 0 IR-PCI-MSI-edge snd_hda_intel
NMI: 34685 34913 26056 21147 32204 28252 22638 18593 Non-maskable interrupts
LOC: 114928218 115170671 127188167 126058968 137437213 133420304 139290962 139491987 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 34685 34913 26056 21147 32204 28252 22638 18593 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts
RES: 33522351 28837002 26218313 18005750 38345373 31731822 18949350 12566571 Rescheduling interrupts
CAL: 15737209 13461854 11113263 8912593 9680282 8091983 6404095 5352164 Function call interrupts
TLB: 197506 171134 147854 137319 303430 308884 297507 298969 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 218 218 218 218 218 218 218 218 Machine check polls
ERR: 0
MIS: 0

As you can see eth0 IRQ (#64) works only on one CPU core instead working in all CPU cores.

# cat /proc/irq/64/smp_affinity
ff


# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-72
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1

How to make to work IRQ on all CPU ?
 
What guest kernel do you use? Please test with 3.X kernels.

All this things are in host

p.s. If firewall is off then all data are passed to KVM and host has no irq overload. Then everything starts in VM. 3.2 kernel didn't helped.

Maybe irqbalance daemon could help ?

#apt-get install irqbalance

Hm, maybe reboot needed to start work balancing. After install nothing has changed.
 
Last edited:
Its more like ksoftirqd process.

DESCRIPTION
ksoftirqd is a per-cpu kernel thread that runs when the machine is
under heavy soft-interrupt load. Soft interrupts are normally serviced
on return from a hard interrupt, but it’s possible for soft interrupts
to be triggered more quickly than they can be serviced. If a soft
interrupt is triggered for a second time while soft interrupts are
being handled, the ksoftirq daemon is triggered to handle the soft
interrupts in process context. If ksoftirqd is taking more than a tiny
percentage of CPU time, this indicates the machine is under heavy soft
interrupt load.
 
I have the same problem. smp_affinity is set to ff but only one cpu takes the network irq
irqbalanse soensn't help. it switches irq betwekn cpus but still obly one of them handles the request
maybe the solution is to separate eths on separate interrupts (now all eths on tyhe same irq) and assign separate cpu for each network interface (and irq)? - but i don't know how to do that
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!