proxmox 1.9 and kernel 2.6.32-6-pve + iptables = kernel panic

RRJ · Sep 21, 2011

Hello,
in previous post i made lots of assumptions about what could be the reason of random kernel panics. i decided to make a new thread as the previous one started to grow with unneeded information.
now i'm sure its up to iptables. if i dont run it on 2 of my proxmox machines, it runs fine. if i load the iptables with simple rules set it crashes
with kernel panic (a pic also included)

Code:

kernel:Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff8149c55c

versions:

Code:

pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-1pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

Code:

Linux services 2.6.32-6-pve #1 SMP Tue Sep 13 10:44:10 CEST 2011 x86_64 GNU/Linux

nic on both servers are integrated. on first - there is intel 2port 1g card on second - hp 2port 1g card.
and i should surely add a notice, that with 2.6.32-4 everything works fine!

and my iptables conf

Code:

services:~# cat /etc/fw
#!/bin/sh


iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
modprobe ipt_REJECT
modprobe ip_conntrack
modprobe ip_conntrack_ftp


my=178.21.xxx.xxx/28
my2=178.21.xxx.xxx/28
barix=178.21.xxx.xxx
tlulib=193.40.xxx.xxx


#flush all rules
iptables -F; iptables -F -t nat; iptables -F -t mangle


#allow everuthing on the loopback interface
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT




#allow from tower prefix
iptables -A INPUT -s $my -j ACCEPT
iptables -A INPUT -s $tlulib -j ACCEPT
iptables -A INPUT -s $my2 -j ACCEPT


iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -p icmp -m icmp -j ACCEPT
iptables -A INPUT -j REJECT --reject-with icmp-host-prohibited




#forward (Firewall for VPSes)
ns1=178.21.xxx.xxx
sc1=178.21.xxx.xxx
netflow=178.21.xxx.xxx
noc=178.21.xxx.xxx
ns2=178.21.xxx.xxx


iptables -A FORWARD -p icmp -m icmp -j ACCEPT
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT


#allow anything from tv tower and tlulib
iptables -A FORWARD -s $my -j ACCEPT
iptables -A FORWARD -s $my2 -j ACCEPT
iptables -A FORWARD -s $tlulib -j ACCEPT


#ns1:
iptables -A FORWARD -d $ns1 -p udp -m udp --dport 53 -j ACCEPT
iptables -A FORWARD -s $ns2 -j ACCEPT


#sc1
#from barix
iptables -A FORWARD -s $barix -j ACCEPT
#for listeners
iptables -A FORWARD -d $sc1 -p tcp -m tcp --dport 8128 -j ACCEPT
iptables -A FORWARD -d $sc1 -p tcp -m tcp --dport 8064 -j ACCEPT


#noc
iptables -A FORWARD -d $noc -p tcp -m tcp --dport 80 -j ACCEPT






iptables -A FORWARD -j REJECT --reject-with icmp-host-prohibited
iptables -nL -v

RRJ · Sep 21, 2011

ideas any1? it is not really safe to leave those vpses without any firewall? (and i don't really want to configure my border juniper firewall to protect some vpses)

RRJ · Sep 21, 2011

okay, i dont know if some1 really interested in this bug, but the problem is in the iptables REJECT action (may be with option --reject-with icmp-host-prohibited only, i didnt test without it, cuz i don't really see if there are interested ppl). atm i changed it with DROP and there is no more random panics.

dietmar · Sep 22, 2011

RRJ said:
okay, i dont know if some1 really interested in this bug,

Sure, we are interested in fixing all bug.

RRJ said:
but the problem is in the iptables REJECT action (may be with option --reject-with icmp-host-prohibited only, i didnt test without it, cuz i don't really see if there are interested ppl).

Please can you test. It would be great if you can track down the problem further.

RRJ · Sep 22, 2011

dietmar said:
Sure, we are interested in fixing all bug.

Please can you test. It would be great if you can track down the problem further.

Hello, dietmar

Glad that there is still war spirit to get some problems solved

okay. kernel shows panic only when i use REJECT in iptables conf file (even without options). as soon as i add REJECT instead of DROP it gives me panic @ random time (this time it took about 17 minutes). As google shows, it can be reached faster if there is only one rule with REJECT in iptables settings.

also, you should note, that i use only briged interfaces (veth) on my proxmox setups (as i have servers in both private and public networks and to play with routes for venet is an overkill, especially when u restart some container, those routes are just stop working and one has to recreate those routes again)

dietmar · Sep 23, 2011

Can't reproduce. What kind of traffic is on your net - maybe you can find a test case that does not depend on external traffic. And what iptable rules do you use exactly.

RRJ · Sep 23, 2011

have You tried with bridged interface? i found on google, that this panic is typical for bridged interfaces only.
try to actively connect to server with ip address that is rejected in some iptables rule.
my iptables rules are in first post of this topic.
on this server there is external traffic. i run there shoutcast and netflow servers.
on the other server there are only typical openvz clients with apache services and all of them passed through nat router before.

RRJ · Sep 26, 2011

Hello,
have You managed to reproduce this panic?

dietmar · Sep 26, 2011

RRJ said:
Hello,
have You managed to reproduce this panic?

No, sorry.

jleg · Sep 27, 2011

Hello,

unfortunately, we also have an issue with current 2.6.32-6 kernel. A little bit of history:
- we installed the first pvetest kernel 2.6.32-6 on a machine, result was a freeze (adaptec problem) because the kernel activated ASPM
- the kernel issued after this with deactivated ASPM (resp. with honoring the BIOS) was running fine and seemed stable
- at some point - after several kernel updates - we now have freezes again, on a regular basis

I can't say with which exact kernel version this problem started - it also seems a bit tricky to link "running kernel" resp. "kernel mentioned in boot log" with "pve kernel package"; for sure all kernels after 2.6.32-6-44 do have this problem. Going back to 2.6.32-5 solves it.

What is interesting - the kernel panic seem to happen mostly somewhere between 03:00 and 04:00 - i did not find any reason for this so far, no log msg, no scheduled backup.

I don't think the panic message on console is of much help - it cannot be scrolled up, probably because the system is frozen after the panic...

this machine also has some iptables rules active - but i found no evidence that this has something to do with this...

gkovacs · Sep 27, 2011

jleg said:
- we installed the first pvetest kernel 2.6.32-6 on a machine, result was a freeze (adaptec problem) because the kernel activated ASPM
- the kernel issued after this with deactivated ASPM (resp. with honoring the BIOS) was running fine and seemed stable

We are also using Adaptec cards, and also experienced a kernel panic today, but only after setting OpenVZ containers to more than 1 CPU, so for us it does not seem to be an Adaptec issue:
http://forum.proxmox.com/threads/7118-Kernel-panic-with-2.6.32-6-and-multi-cpu-OpenVZ

Couple of questions:
- how do you know the Adaptec driver or ASPM caused the panic? did something on the panic screen show up referring to that?
- please elaborate what ASPM is, and in which BIOS (system or Adaptec) can you turn it off?

jleg · Sep 27, 2011

gkovacs said:
We are also using Adaptec cards, and also experienced a kernel panic today, but only after setting OpenVZ containers to more than 1 CPU, so for us it does not seem to be an Adaptec issue:
http://forum.proxmox.com/threads/7118-Kernel-panic-with-2.6.32-6-and-multi-cpu-OpenVZ

Couple of questions:
- how do you know the Adaptec driver or ASPM caused the panic? did something on the panic screen show up referring to that?
- please elaborate what ASPM is, and in which BIOS (system or Adaptec) can you turn it off?

you'll find the complete story here: http://forum.proxmox.com/threads/6980-New-2.6.32-Kernel-with-stable-OpenVZ-(pvetest)?p=39679
in short: ASPM means "Active State Power Management", a kind of power-throtteling for PCIe, and this is something which our Adaptecs proved to "strongly dislike" in the past.
The first 2.6.32-6 kernel erroneously did turn this on, even if disabled in BIOS...

tom · Sep 27, 2011

jleg said:
you'll find the complete story here: http://forum.proxmox.com/threads/6980-New-2.6.32-Kernel-with-stable-OpenVZ-(pvetest)?p=39679
in short: ASPM means "Active State Power Management", a kind of power-throtteling for PCIe, and this is something which our Adaptecs proved to "strongly dislike" in the past.
The first 2.6.32-6 kernel erroneously did turn this on, even if disabled in BIOS...

Its not that clear that this is an error, it is more a wanted behavior. I think the bug is on the raid cards. but anyways, now we changed the behavior to respect the bios.

gkovacs · Sep 27, 2011

jleg said:
in short: ASPM means "Active State Power Management", a kind of power-throtteling for PCIe

We are using PCI based (PCI-X) Adaptec cards, so in that case our kernel panic is most likely not related to this issue.

jleg · Sep 27, 2011

tom said:
Its not that clear that this is an error, it is more a wanted behavior. I think the bug is on the raid cards. but anyways, now we changed the behavior to respect the bios.

i agree - the *reason* for failing is the adaptec itself failing with a kernel panic (yes, they run their own "distro" on these cards

, resulting in linux kernel panicking.
But exactly because this *might* happen (there is no "law" to let mainboard manufacturers force adaptec & Co to make them support ASPM, there might be other features as well), there are lots of "switches" in the BIOS. If i couldn't switch this off in BIOS, i'd already get the adpatec's kernel panic after POST, being unable to boot anything.
So a linux kernel re-enabling such i'd regard at least, well, "unexpected". No other distro we deal with here is doing that...

tom · Sep 27, 2011

jleg said:
...No other distro we deal with here is doing that...

See https://bugzilla.redhat.com/show_bug.cgi?id=736916

jleg · Sep 27, 2011

tom said:
See https://bugzilla.redhat.com/show_bug.cgi?id=736916

Ok, then - "we don't deal with RHEL6 yet"... oO Thanks for pointing me to this one! That's something we have to check here RSN...

RRJ · Oct 4, 2011

Today i've seen a new kernel update has arrived. is there something related to this issue too?

Search

Search

proxmox 1.9 and kernel 2.6.32-6-pve + iptables = kernel panic

RRJ

Member

RRJ

Member

RRJ

Member

dietmar

Proxmox Staff Member

RRJ

Member

dietmar

Proxmox Staff Member

RRJ

Member

RRJ

Member

dietmar

Proxmox Staff Member

jleg

Member

gkovacs

Renowned Member

jleg

Member

tom

Proxmox Staff Member

gkovacs

Renowned Member

jleg

Member

tom

Proxmox Staff Member

jleg

Member

RRJ

Member