Reproducible kernel panic using 2.6.18 and NFSv4 on OpenVZ

iti-asi

Member
Jul 14, 2009
52
0
6
València
www.iti.upv.es
I'm starting a new thread for the discussion started in http://proxmox.com/forum/showthread.php?p=16824#post16824.

Yesterday, an update of the 2.6.18 PVE kernel from pvetest brought down our entire cluster within minutes; all the hosts were locked with a kernel panic apparently related to NFSv4. A screenshot of the panic can be seen here: http://proxmox.com/forum/attachment.php?attachmentid=158&d=1263407893.

The panic can be easily reproduced very easily: it will happen a few seconds after starting a container which uses our NFSv4 setup.

This is our mount script for the container:
Code:
#!/bin/sh
. /etc/vz/vz.conf
. ${VE_CONFFILE}
for foo in 0 1 2 3 4 5 6 7 8 9; do
  if mountpoint /export/home2 > /dev/null; then
    vzmount /export/home2 ${VE_ROOT}/export/home2
    exit 0
  else
    mount /export/home2 || sleep 60
  fi
done
mail -s "vzmount: mount failed @ `hostname`" email@address << EOF
El montaje de ${VE_ROOT}/export/home2 en `hostname` falló.
EOF
exit 1
This is the umount script:
Code:
#!/bin/sh
. /etc/vz/vz.conf
. ${VE_CONFFILE}
vzumount ${VE_ROOT}/export/home2 || true
umount /export/home2 2>/dev/null || true
vzmount is the very useful mount wrapper that permits using NFSv4 in the containers as well as allowing suspension/restoration. It can be found in the openvz forum, http://forum.openvz.org/index.php?t=msg&goto=8357&&srch=vzmount#msg_8357.
 
The error also occur when you do not use that wrapper? What NFS server do you use?

The kernel server is the standard lenny kernel server:
Code:
ii  nfs-kernel-ser 1:1.1.2-6lenny support for NFS kernel server

I haven't been able to test without the wrapper. We'll try to setup a test box ASAP that we can crash at will, as I've restarted the whole cluster using 2.6.24-10-pve and long-running processes are already started.
 
Please can you test with 2.6.32?

It's not very easy in this setup, as the NFS server is also running the head node of a computing cluster (Torque server) and we won't be able to stop it in a while. Hopefully we can setup a mini cluster for testing soon.

By the way, thanks for reintroducing initrd autogeneration in 2.6.24-10-pve!
 
It's not very easy in this setup, as the NFS server is also running the head node of a computing cluster (Torque server) and we won't be able to stop it in a while. Hopefully we can setup a mini cluster for testing soon.

Well, maybe you can test on another servers (also try 2.6.18). Maybe it is just an incompatibility between 2.6.18 and 2.6.24.

- Dietmar
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!