Reproducible kernel panic using 2.6.18 and NFSv4 on OpenVZ

iti-asi

Member
Jul 14, 2009
52
0
6
València
www.iti.upv.es
I'm starting a new thread for the discussion started in http://proxmox.com/forum/showthread.php?p=16824#post16824.

Yesterday, an update of the 2.6.18 PVE kernel from pvetest brought down our entire cluster within minutes; all the hosts were locked with a kernel panic apparently related to NFSv4. A screenshot of the panic can be seen here: http://proxmox.com/forum/attachment.php?attachmentid=158&d=1263407893.

The panic can be easily reproduced very easily: it will happen a few seconds after starting a container which uses our NFSv4 setup.

This is our mount script for the container:
Code:
#!/bin/sh
. /etc/vz/vz.conf
. ${VE_CONFFILE}
for foo in 0 1 2 3 4 5 6 7 8 9; do
  if mountpoint /export/home2 > /dev/null; then
    vzmount /export/home2 ${VE_ROOT}/export/home2
    exit 0
  else
    mount /export/home2 || sleep 60
  fi
done
mail -s "vzmount: mount failed @ `hostname`" email@address << EOF
El montaje de ${VE_ROOT}/export/home2 en `hostname` falló.
EOF
exit 1
This is the umount script:
Code:
#!/bin/sh
. /etc/vz/vz.conf
. ${VE_CONFFILE}
vzumount ${VE_ROOT}/export/home2 || true
umount /export/home2 2>/dev/null || true
vzmount is the very useful mount wrapper that permits using NFSv4 in the containers as well as allowing suspension/restoration. It can be found in the openvz forum, http://forum.openvz.org/index.php?t=msg&goto=8357&&srch=vzmount#msg_8357.
 
The error also occur when you do not use that wrapper? What NFS server do you use?

The kernel server is the standard lenny kernel server:
Code:
ii  nfs-kernel-ser 1:1.1.2-6lenny support for NFS kernel server

I haven't been able to test without the wrapper. We'll try to setup a test box ASAP that we can crash at will, as I've restarted the whole cluster using 2.6.24-10-pve and long-running processes are already started.
 
Please can you test with 2.6.32?

It's not very easy in this setup, as the NFS server is also running the head node of a computing cluster (Torque server) and we won't be able to stop it in a while. Hopefully we can setup a mini cluster for testing soon.

By the way, thanks for reintroducing initrd autogeneration in 2.6.24-10-pve!
 
It's not very easy in this setup, as the NFS server is also running the head node of a computing cluster (Torque server) and we won't be able to stop it in a while. Hopefully we can setup a mini cluster for testing soon.

Well, maybe you can test on another servers (also try 2.6.18). Maybe it is just an incompatibility between 2.6.18 and 2.6.24.

- Dietmar