skb_under_panic

Mr.Embedded · Dec 16, 2008

Hi all,

I am having some issues with a test build up with 2 debian lenny containers that are running simple apache webservers but push quite a bit of data.

Everything is fine and then without warning there is a panic. I am currently using the latest and greatest proxmox.

There were no messges in the logs that I could see but the last time it happend, I was logged into the console and this spit out before the crash:

Message from syslogd@[myhostname] at Tue Dec 16 06:52:15 2008 ...
[myhostname] kernel: skb_under_panic: text:ffffffff88388a1e len:1500 put:0 head:ffff8101094d9800 data:5aff8101094d9820 tail:0x5fc end:0x680 dev:eth0

Message from syslogd@[myhostname] at Tue Dec 16 06:52:15 2008 ...
[myhostname] kernel: ------------[ cut here ]------------

I am not sure what this means. I have 2 intel gigabit NICs in this box with only one in use at the moment. My partner suggested to raise the MTU to 9000 as the switch it connects to supports that and see what happens but I don't see the relation with the above messge.

Any assistance would be appreciated.

Amazing product BTW!

dietmar · Dec 17, 2008

Do you use vlans or bonding?

Mr.Embedded · Dec 17, 2008

Neither. Its a straight proxmox setup on on nic/bridge with 2 vms on it. The 2nd nic is not enabled at the moment.

dietmar · Dec 17, 2008

Whats the output of:

# pveversion -v

Mr.Embedded · Dec 17, 2008

[myhostname]:~# pveversion -v
pve-manager: 1.0-10 (pve-manager/1.0/3463)
qemu-server: 1.0-5
pve-kernel: 2.6.24-4
pve-kvm: 75-1
pve-firmware: 1
vncterm: 0.9-1
vzctl: 3.0.22-3pve3
vzdump: 1.1-1
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1dso1

Mr.Embedded · Dec 17, 2008

Also I couldn't move the MTU to 9000 because it wasn't supported by the NIC that is in use.

And I know that after it crashed the last time and manually I restarted it this is what pveversion -v showed:

[myhostname]:~# pveversion -v
pve-manager: 1.0-10 (pve-manager/1.0/3463)
qemu-server: not correctly installed
pve-kernel: not correctly installed
pve-kvm: not correctly installed
pve-firmware: not correctly installed
vncterm: not correctly installed
vzctl: not correctly installed
vzdump: not correctly installed
vzprocps: not correctly installed
vzquota: not correctly installed

So thats why I restarted the host the last time and then the pveversion -v showed correclty what was in the above post. This box has been up for at least 2 weeks before these issues started happening.

dietmar · Dec 17, 2008

Is the error reproducibel somehow? Do you use veth or venet?

Mr.Embedded · Dec 17, 2008

Nope. I haven't been able to reproduce it yet. It just happens. We just started really pumping traffic through these machines. The host port (switch side) shows a constant rate of between 35-72Mbps depending on time of day.

We are using a bridged veth connection with standard debian etch containers that we upgraded to lenny.

Mr.Embedded · Dec 19, 2008

More info. This just showed up in the logs and I have access to the vms I cannot restart or stop them. I haven't retried to reboot the host yet but I will soon.

Dec 19 14:33:34 [myhostname] -- MARK --
Dec 19 14:53:34 [myhostname] -- MARK --
Dec 19 14:58:03 [myhostname] kernel: CPU: 1
Dec 19 14:58:03 [myhostname] kernel: Modules linked in: nfs lockd nfs_acl sunrpc vzethdev vznetdev simfs vzrst vzcpt tun vzmon xt_tcpudp xt_length ipt_ttl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables kvm_intel kvm vzdquota vzdev ipv6 bridge dm_snapshot dm_mirror snd_hda_intel serio_raw snd_pcm e1000 snd_timer parport_pc parport psmouse snd_page_alloc snd_hwdep snd evdev thermal button processor pcspkr intel_agp heci soundcore e1000e sg dm_mod usbhid hid usb_storage libusual sd_mod sr_mod ide_disk ide_generic ide_cd cdrom ide_core shpchp pci_hotplug uhci_hcd ehci_hcd usbcore iTCO_wdt iTCO_vendor_support ahci i2c_i801 i2c_core pata_marvell pata_acpi ata_generic libata scsi_mod ohci1394 ieee1394 isofs msdos fat
Dec 19 14:58:03 [myhostname] kernel: Pid: 181, comm: kswapd0 Tainted: G D 2.6.24-1-pve #1 ovz005
Dec 19 14:58:03 [myhostname] kernel: RIP: 0010:[<ffffffff88552a3f>] [<ffffffff88552a3f>] :nfs:nfs_clear_inode+0x2f/0x40
Dec 19 14:58:03 [myhostname] kernel: RSP: 0000:ffff81011fdd7d30 EFLAGS: 00010286
Dec 19 14:58:03 [myhostname] kernel: RAX: ffff810082ef5540 RBX: ffff810082ef56f0 RCX: ffff810129a04828
Dec 19 14:58:03 [myhostname] kernel: RDX: ffffffff80306fc0 RSI: ffff8101298a5780 RDI: ffff810082ef56f0
Dec 19 14:58:03 [myhostname] kernel: RBP: ffff810082ef56f0 R08: 0000000000000087 R09: 0000000000000001
Dec 19 14:58:03 [myhostname] kernel: R10: 0000000000000000 R11: ffffffff885514c0 R12: ffff81011fdd7d90
Dec 19 14:58:03 [myhostname] kernel: R13: 0000000000000004 R14: 0000000000000080 R15: 00019b69653d4bca
Dec 19 14:58:03 [myhostname] kernel: FS: 0000000000000000(0000) GS:ffff810127002880(0000) knlGS:0000000000000000
Dec 19 14:58:03 [myhostname] kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Dec 19 14:58:03 [myhostname] kernel: CR2: 00000000bf944c3c CR3: 00000000331c0000 CR4: 00000000000026e0
Dec 19 14:58:03 [myhostname] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 19 14:58:03 [myhostname] kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 19 14:58:03 [myhostname] kernel: Process kswapd0 (pid: 181, veid=0, threadinfo ffff81011fdd6000, task ffff81012253e8e0)
Dec 19 14:58:03 [myhostname] kernel: Stack: ffff810082ef56f0 ffffffff802de099 ffff810082ef5700 ffffffff802de429
Dec 19 14:58:03 [myhostname] kernel: ffff8100ce40b070 ffff8100ba68e5f0 0000000000000080 00000000a83da913
Dec 19 14:58:03 [myhostname] kernel: 0000000000019ba6 ffffffff802dec64 0000000000000000 00000080000000d0
Dec 19 14:58:03 [myhostname] kernel: Call Trace:
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff802de099>] clear_inode+0x99/0x150
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff802de429>] dispose_list+0x29/0x110
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff802dec64>] shrink_icache_memory+0x294/0x380
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff8029f7dd>] shrink_slab+0x17d/0x1f0
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff8029fcaa>] kswapd+0x3ca/0x5a0
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff804a5822>] thread_return+0x3d/0x5bb
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff80257f50>] autoremove_wake_function+0x0/0x30
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff8029f8e0>] kswapd+0x0/0x5a0
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff80257b7b>] kthread+0x4b/0x80
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff8020d338>] child_rip+0xa/0x12
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff80257b30>] kthread+0x0/0x80
Dec 19 14:58:03 [myhostname] kernel: [<ffffffff8020d32e>] child_rip+0x0/0x12
Dec 19 14:58:03 [myhostname] kernel:
Dec 19 14:58:03 [myhostname] kernel:
Dec 19 14:58:03 [myhostname] kernel: Code: 0f 0b eb fe 0f 0b eb fe 66 66 90 66 66 90 66 66 90 53 48 89
Dec 19 14:58:03 [myhostname] kernel: RSP <ffff81011fdd7d30>
Dec 19 14:58:03 [myhostname] kernel: ---[ end trace 69d6754c4911ca32 ]---
Dec 19 15:02:51 [myhostname] kernel: Fatal resource shortage: privvmpages, UB 101.
Dec 19 15:02:51 [myhostname] last message repeated 3 times
Dec 19 15:06:29 [myhostname] kernel: TCPv6: dropping request, synflood is possible
Dec 19 15:33:34 [myhostname] -- MARK --
Dec 19 15:53:34 [myhostname] -- MARK --

I have no idea what is happening with this box.

dietmar · Dec 20, 2008

Are you running NFS? This look like a NFS error.

Mr.Embedded · Dec 20, 2008

Yes we just enabled NFS on the containers not to long ago.

Mr.Embedded · Dec 23, 2008

Just an update. The box is running NFS as stated but as a client, meaning there is a NFS share mounted to it; it is not serving NFS.

Now the bad bit. I don't believe the Proxmox/OpenVZ containers and host can handle consistent high network traffic. In our setup we have one box with a dual core cpu, 4GB ram, 1GB Intel ethernet (no jumbo frames) network and a single 160GB drive for the host. There are 2 vms running the standard debian etch container that has been upgraded to lenny. Those vms have apache2 and php5 installed. They are part of a real loadbalanced server pool that just serves web requests, basically file retrieval from the NFS drives. Those real servers have the same hardware as the proxmox host.

When the box runs, it runs like a charm. Basically it provides double the performance of the other real servers as far as bandwidth pushing is concerned; each VM performs as if it were a real server. I believe this is because the application is light enough to be virtualized this way.

Now this box jams again and again where it needs a physical reboot (like once every 48 hours) and the only thing I see is that it chokes due to the high amount of network traffic (a constant 40Mbit with bursts up to 80Mbit divided between the vms). When it jams, there is no way you can stop the containers with vzctl or /etc/init.d/vz. You can try to reboot the host and that may or may not work. So its a hard reboot fix. There may or may not be interesting log information at the time of the jam.

Also it should be noted that sometimes I can generally stop/start/restart the containers on the host but when it jams there is nothing I can do but power cycle.

Initially I thought that there may have been some syn floods or other external traffic jamming up the box but I put a pfsense firewall in front of the box to deflect any hammering and to ensure the traffic is contained to well defined limits that the vms should be able to easily handle.

The box still jams up. I am not an expert with Proxmox/OpenVZ by any means but from what I have seen I feel the box cannot handle the traffic. I would have expected the vms to lock up due to an application issue and just restarting them would fix the issue. But in this case the host is affected as well.

If anyone here has experience with pushing the kind of bandwith I am talking about with a proxmox host, I would be glad if you could share your experience with me.

dietmar · Dec 23, 2008

Please can you do the tests without NFS? Just to make sure that NFS is not the problem.

- Dietmar

Mr.Embedded · Dec 23, 2008

What protocol do you suggest that I use for file transfer. The application transfers files on the NFS mount. I guess I could try and create another type of file system rather than NFS and try that. What do you suggest.

dietmar · Dec 23, 2008

I just though you can test using local file system only?

- Dietmar

Mr.Embedded · Jan 3, 2009

Ok, I have removed the NFS on the container setup and have set the application to do some http transfers rather than NFS (for downloads. So far so good, its been almost 24 hours with no lockups. I will keep you posted on the results.

Mr.Embedded · Jan 6, 2009

Great news. It is very stable now. The NFS was probably not configured right/had a bug in my config so removing it worked wonders for the box stability.

What I have done to workaround this is:

1. Mounted the NFS share via /etc/fstab onto the hardware node instead of the container
2. Did a bind mount in /etc/fstab on the hardware node mounting the NFS share to a directory in the container file system.

So far it works like a charm.

The mount command in /etc/fstab was

/my/real/dir /to/mount/dir none rw,bind 0 0

There still a bit of a hiccup on boot where the mounts are not automatically setting up from /etc/fstab but a umount/mount will make it work.

Thanks for the help

kobuki · Jan 6, 2009

This is interesting. What you've done it seems is effectively proxying the NFS operations outside of the VE. But this seems to me that there ineed are some problems with the VE's robustness. Or with one/some of the virtualized device drivers.

On another note, we're using a few similar configurations on real hardware with heavy HTTP traffic, but because of the performance penalty of NFS (and other remote filesystems), we just usually use reverse HTTP proxies to provide content on the frontend nodes.

Mr.Embedded · Jan 6, 2009

Yeah what we do is set all the mtu of the fileservers and the webnodes accessing them to 9000. This makes things fly. On some of our fileservers we can hit around 380MBits/sec on the port. They are being accessed by around 35 webnodes simultaneously.

I am excited that the bind mounts work well. I actually am prefering that method as if you have 10 vms on one host, the single NFS mount on the host will definitely improve performance rather than having 10 separate mounts on the vms themselves. We are testing with 2 vms per hardware node as we are using existing hardware. Each of the vms are outperforming the hardware nodes at the moment, probably because the existing hardware nodes have a much older codeset in place. This will allow us to safely double our capacity with the same hardware as the application is only using about 10-15% of the network capacity on the hardware nodes at the moment.

skb_under_panic

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Renowned Member

Well-Known Member