proxmox 1.9 and kernel 2.6.32-6-pve + iptables = kernel panic

RRJ

Member
Apr 14, 2010
245
0
16
Estonia, Tallinn
Hello,
in previous post i made lots of assumptions about what could be the reason of random kernel panics. i decided to make a new thread as the previous one started to grow with unneeded information.
now i'm sure its up to iptables. if i dont run it on 2 of my proxmox machines, it runs fine. if i load the iptables with simple rules set it crashes
with kernel panic (a pic also included)
Code:
kernel:Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff8149c55c

panic.PNG

versions:
Code:
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-1pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

Code:
Linux services 2.6.32-6-pve #1 SMP Tue Sep 13 10:44:10 CEST 2011 x86_64 GNU/Linux

nic on both servers are integrated. on first - there is intel 2port 1g card on second - hp 2port 1g card.
and i should surely add a notice, that with 2.6.32-4 everything works fine!

and my iptables conf

Code:
services:~# cat /etc/fw
#!/bin/sh


iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
modprobe ipt_REJECT
modprobe ip_conntrack
modprobe ip_conntrack_ftp


my=178.21.xxx.xxx/28
my2=178.21.xxx.xxx/28
barix=178.21.xxx.xxx
tlulib=193.40.xxx.xxx


#flush all rules
iptables -F; iptables -F -t nat; iptables -F -t mangle


#allow everuthing on the loopback interface
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT




#allow from tower prefix
iptables -A INPUT -s $my -j ACCEPT
iptables -A INPUT -s $tlulib -j ACCEPT
iptables -A INPUT -s $my2 -j ACCEPT


iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -p icmp -m icmp -j ACCEPT
iptables -A INPUT -j REJECT --reject-with icmp-host-prohibited




#forward (Firewall for VPSes)
ns1=178.21.xxx.xxx
sc1=178.21.xxx.xxx
netflow=178.21.xxx.xxx
noc=178.21.xxx.xxx
ns2=178.21.xxx.xxx


iptables -A FORWARD -p icmp -m icmp -j ACCEPT
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT


#allow anything from tv tower and tlulib
iptables -A FORWARD -s $my -j ACCEPT
iptables -A FORWARD -s $my2 -j ACCEPT
iptables -A FORWARD -s $tlulib -j ACCEPT


#ns1:
iptables -A FORWARD -d $ns1 -p udp -m udp --dport 53 -j ACCEPT
iptables -A FORWARD -s $ns2 -j ACCEPT


#sc1
#from barix
iptables -A FORWARD -s $barix -j ACCEPT
#for listeners
iptables -A FORWARD -d $sc1 -p tcp -m tcp --dport 8128 -j ACCEPT
iptables -A FORWARD -d $sc1 -p tcp -m tcp --dport 8064 -j ACCEPT


#noc
iptables -A FORWARD -d $noc -p tcp -m tcp --dport 80 -j ACCEPT






iptables -A FORWARD -j REJECT --reject-with icmp-host-prohibited
iptables -nL -v
 
Last edited:
ideas any1? it is not really safe to leave those vpses without any firewall? (and i don't really want to configure my border juniper firewall to protect some vpses)
 
okay, i dont know if some1 really interested in this bug, but the problem is in the iptables REJECT action (may be with option --reject-with icmp-host-prohibited only, i didnt test without it, cuz i don't really see if there are interested ppl). atm i changed it with DROP and there is no more random panics.
 
okay, i dont know if some1 really interested in this bug,

Sure, we are interested in fixing all bug.

but the problem is in the iptables REJECT action (may be with option --reject-with icmp-host-prohibited only, i didnt test without it, cuz i don't really see if there are interested ppl).

Please can you test. It would be great if you can track down the problem further.
 
Sure, we are interested in fixing all bug.



Please can you test. It would be great if you can track down the problem further.

Hello, dietmar

Glad that there is still war spirit to get some problems solved :)
okay. kernel shows panic only when i use REJECT in iptables conf file (even without options). as soon as i add REJECT instead of DROP it gives me panic @ random time (this time it took about 17 minutes). As google shows, it can be reached faster if there is only one rule with REJECT in iptables settings.

also, you should note, that i use only briged interfaces (veth) on my proxmox setups (as i have servers in both private and public networks and to play with routes for venet is an overkill, especially when u restart some container, those routes are just stop working and one has to recreate those routes again)
 
Can't reproduce. What kind of traffic is on your net - maybe you can find a test case that does not depend on external traffic. And what iptable rules do you use exactly.
 
have You tried with bridged interface? i found on google, that this panic is typical for bridged interfaces only.
try to actively connect to server with ip address that is rejected in some iptables rule.
my iptables rules are in first post of this topic.
on this server there is external traffic. i run there shoutcast and netflow servers.
on the other server there are only typical openvz clients with apache services and all of them passed through nat router before.
 
Hello,

unfortunately, we also have an issue with current 2.6.32-6 kernel. A little bit of history:
- we installed the first pvetest kernel 2.6.32-6 on a machine, result was a freeze (adaptec problem) because the kernel activated ASPM
- the kernel issued after this with deactivated ASPM (resp. with honoring the BIOS) was running fine and seemed stable
- at some point - after several kernel updates - we now have freezes again, on a regular basis

I can't say with which exact kernel version this problem started - it also seems a bit tricky to link "running kernel" resp. "kernel mentioned in boot log" with "pve kernel package"; for sure all kernels after 2.6.32-6-44 do have this problem. Going back to 2.6.32-5 solves it.

What is interesting - the kernel panic seem to happen mostly somewhere between 03:00 and 04:00 - i did not find any reason for this so far, no log msg, no scheduled backup.

I don't think the panic message on console is of much help - it cannot be scrolled up, probably because the system is frozen after the panic...
proxmox_freeze2.png

this machine also has some iptables rules active - but i found no evidence that this has something to do with this...
 
- we installed the first pvetest kernel 2.6.32-6 on a machine, result was a freeze (adaptec problem) because the kernel activated ASPM
- the kernel issued after this with deactivated ASPM (resp. with honoring the BIOS) was running fine and seemed stable

We are also using Adaptec cards, and also experienced a kernel panic today, but only after setting OpenVZ containers to more than 1 CPU, so for us it does not seem to be an Adaptec issue:
http://forum.proxmox.com/threads/7118-Kernel-panic-with-2.6.32-6-and-multi-cpu-OpenVZ

Couple of questions:
- how do you know the Adaptec driver or ASPM caused the panic? did something on the panic screen show up referring to that?
- please elaborate what ASPM is, and in which BIOS (system or Adaptec) can you turn it off?
 
We are also using Adaptec cards, and also experienced a kernel panic today, but only after setting OpenVZ containers to more than 1 CPU, so for us it does not seem to be an Adaptec issue:
http://forum.proxmox.com/threads/7118-Kernel-panic-with-2.6.32-6-and-multi-cpu-OpenVZ

Couple of questions:
- how do you know the Adaptec driver or ASPM caused the panic? did something on the panic screen show up referring to that?
- please elaborate what ASPM is, and in which BIOS (system or Adaptec) can you turn it off?

you'll find the complete story here: http://forum.proxmox.com/threads/6980-New-2.6.32-Kernel-with-stable-OpenVZ-(pvetest)?p=39679
in short: ASPM means "Active State Power Management", a kind of power-throtteling for PCIe, and this is something which our Adaptecs proved to "strongly dislike" in the past.
The first 2.6.32-6 kernel erroneously did turn this on, even if disabled in BIOS...
 
you'll find the complete story here: http://forum.proxmox.com/threads/6980-New-2.6.32-Kernel-with-stable-OpenVZ-(pvetest)?p=39679
in short: ASPM means "Active State Power Management", a kind of power-throtteling for PCIe, and this is something which our Adaptecs proved to "strongly dislike" in the past.
The first 2.6.32-6 kernel erroneously did turn this on, even if disabled in BIOS...

Its not that clear that this is an error, it is more a wanted behavior. I think the bug is on the raid cards. but anyways, now we changed the behavior to respect the bios.
 
Its not that clear that this is an error, it is more a wanted behavior. I think the bug is on the raid cards. but anyways, now we changed the behavior to respect the bios.

i agree - the *reason* for failing is the adaptec itself failing with a kernel panic (yes, they run their own "distro" on these cards :), resulting in linux kernel panicking.
But exactly because this *might* happen (there is no "law" to let mainboard manufacturers force adaptec & Co to make them support ASPM, there might be other features as well), there are lots of "switches" in the BIOS. If i couldn't switch this off in BIOS, i'd already get the adpatec's kernel panic after POST, being unable to boot anything.
So a linux kernel re-enabling such i'd regard at least, well, "unexpected". No other distro we deal with here is doing that...
 
Today i've seen a new kernel update has arrived. is there something related to this issue too?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!