openvz server and NFS -> problem with "stop"

stef1777 · Oct 29, 2010

Hi!

I have an Proxmox VE 1.6 server with one openvz host. A remote volume is mounted on the VE using NFS client. When I "stop" the VE, nothing append.

ps show the process but nothing append and I can't kill it.

root 10081 1 0 13:56 ? 00:00:00 /usr/sbin/vzctl stop 108

I have to reboot the promox server to go back to normal. If I again mount the nfs volume and do stop, I go back to the previous problem. Without nfs volume mounted, no problem with stop.

Any idea?

dietmar · Oct 31, 2010

Any hint in the (init) log of the container?

stef1777 · Oct 31, 2010

Nothing special in INIT log but I made more test. See below.

I have 2 proxmox VE servers : one master, one slave. The VE with NFS is on the slave server.
The Proxmox servers are the last 1.6.

pve-manager: 1.6-5 (pve-manager/1.6/5261)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.6-25
pve-kernel-2.6.32-3-pve: 2.6.32-14
pve-kernel-2.6.32-4-pve: 2.6.32-25
qemu-server: 1.1-22
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-14
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-8
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.5-2
ksm-control-daemon: 1.0-4

The VE is a Debian Lenny 32bits with only Apache and Darwing Streaming Server. The VE have the FEATURES="nfs

n ". I mount one remote volume using NFS.

Seems that I have 2 differents problems:

- if on the master with the GUI, I stop the VE, the master lost connection. I found "proxwww[13960]: 500 read timeout" in syslog on the master. I have this problem with and without NFS activated. I recreated the cluster clearing all configs, destroyed the VE and mounted a copy with vzrestore but I still have the problem. The vue is not refreshed in the GUI and got Unable to load cluster table.

- on the slave in shell, I can stop/start the VE if not NFS.

- If I activate NFS on VE with nfs

n, I have 2 cases:

-- NFS remote volume not mounted: I can stop/start. VE seems to work fine.

-- NFS remote volume mounted: I can't stop. I got timeout to the vzctl stop command and found this in syslog on the slave

Oct 31 10:16:58 cerimes-vs002 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 31 10:16:58 cerimes-vs002 kernel: vzctl D ffff8802abc1a000 0 6302 1 0x00000000
Oct 31 10:16:58 cerimes-vs002 kernel: ffff8802ad8d8000 0000000000000082 0000000000000000 0000000000012ca0
Oct 31 10:16:58 cerimes-vs002 kernel: 0000000000000030 0000000000000002 000000000000fa40 ffff88029fa83fd8
Oct 31 10:16:58 cerimes-vs002 kernel: 0000000000016940 0000000000016940 ffff8802abc1a000 ffff8802abc1a2f8
Oct 31 10:16:58 cerimes-vs002 kernel: Call Trace:
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81314587>] ? rwsem_down_failed_common+0x8c/0xa8
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff813145ea>] ? rwsem_down_read_failed+0x22/0x2b
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81182244>] ? call_rwsem_down_read_failed+0x14/0x30
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81313f9d>] ? down_read+0x17/0x19
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa0313f01>] ? do_env_enter+0x2d/0x157 [vzmon]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa031520d>] ? real_env_create+0xda5/0xdea [vzmon]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810cdc6f>] ? __do_fault+0x425/0x455
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810f3d82>] ? chrdev_open+0x0/0x13e
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810efa44>] ? __dentry_open+0x1aa/0x2a5
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa0315485>] ? vzcalls_ioctl+0x233/0x501 [vzmon]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa030d1cd>] ? vzctl_ioctl+0x39/0x54 [vzdev]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810fcf06>] ? vfs_ioctl+0x21/0x6c
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810fd454>] ? do_vfs_ioctl+0x48d/0x4cb
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810fd4cf>] ? sys_ioctl+0x3d/0x5c
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b

Another problem: The first time I try to mount the NFS volume on the VE, the mount command respond "mount.nfs: No such device". I need to type "modprobe nfs" on the proxmox slave server. After, I can mount the NFS remote volume on the VE.

And the last: If the NFS is mounted on the VE, I should type 2 times reboot command on the slave to reboot it. The first reboot do nothing.

No fine. ;-)

stef1777 · Oct 31, 2010

I've done again all NFS test using a promox node not part of cluster.

I used openvz template "debian-5.0-standard_5.0-2_i386" for the test.

Same problem with NFS. I got an "Unable to stop container: operation timed out" if a NFS volume is mounted.

And I need to type "modprobe nfs" before.

dietmar · Nov 1, 2010

Please can you try with kernel 2.6.18?

stef1777 · Nov 5, 2010

Hello!

Sorry I was busy.

Using 2.6.18, the NFS mount/umount work fine. We can stop the VM with NFS volume mounted without problem. Automount def in fstab is well mounted at start.

But still a problem: we need to type "modprobe nfs" the first time on the proxmox server.

I also found that when we switch between 2.6.32 and 2.6.18 we lost the EN device on the PowerEdge 1950. eth0 is lost. No more network. Need to clear device or use eth2.

pille · Nov 10, 2010

stef1777 said:
But still a problem: we need to type "modprobe nfs" the first time on the proxmox server.

just put nfs in /etc/modules.

zzhjkrqlne · Jan 12, 2011

dietmar said:
Please can you try with kernel 2.6.18?

Hi, I've been having the identical issue as the original poster on varying versions of Proxmox, including the latest 1.7 with the 2.6.32-4-pve kernel.

# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.18-4-pve: 2.6.18-10
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

If I use 2.6.18-4-pve as suggested, it has no such issues. Since I'd much prefer to use 2.6.32-4-pve (and still keep NFS clients on some VEs), is it possible to narrow this issue down to something more specific, so I can follow up with the right people? Is this issue related to Proxmox itself, or the Debian kernel it uses, or the OpenVZ itself, or combination of all those?

Thanks.

dietmar · Jan 12, 2011

zzhjkrqlne said:
is it possible to narrow this issue down to something more specific, so I can follow up with the right people? Is this issue related to Proxmox itself, or the Debian kernel it uses, or the OpenVZ itself,

Seem to be OpenVZ related. Can you can reproduce the bug reliable? If so, please report a bug to the OpenVZ bug tracker.

zzhjkrqlne · Jan 12, 2011

dietmar said:
Seem to be OpenVZ related. Can you can reproduce the bug reliable? If so, please report a bug to the OpenVZ bug tracker.

Thanks for the advice. I was able to reproduce this issue on a number of Proxmox installations so far. I'll try a bit more and go to OpenVZ forum/bug tracker, if it comes to that. I'll report back here if I find anything interesting.

zzhjkrqlne · Jan 13, 2011

zzhjkrqlne said:
Thanks for the advice. I was able to reproduce this issue on a number of Proxmox installations so far. I'll try a bit more and go to OpenVZ forum/bug tracker, if it comes to that. I'll report back here if I find anything interesting.

Looks like it is indeed OpenVZ + 2.6.32 kernel. I've been able to reproduce this on different combination of distributions on HN and VE. There was already a bug report on this at http://bugzilla.openvz.org/show_bug.cgi?id=1626 , but not fixed yet.

stef1777 · Feb 8, 2011

Hello,

Any news from this bug?

I installed a new server yesterday and used it to check if the problem is still here. -> YES.

Tested with uptodate Proxmox VE 1.7 with kernel 2.6.32-4-pve.
VM is Debian amd64 Lenny uptodate.

If you manualy unmount the nfs volume before doing stop, all is fine.

INFO: task init:9223 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
init D ffff88051ba1a000 0 9223 1 0x00000000
ffff88029dc8c000 0000000000000046 0000000000000000 0000000000000179
ffff88028523e800 0000000000000000 000000000000fa40 ffff88051afa1fd8
0000000000016940 0000000000016940 ffff88051ba1a000 ffff88051ba1a2f8
Call Trace:
[<ffffffff81051908>] ? do_wait+0x1f9/0x237
[<ffffffff81314e97>] ? rwsem_down_failed_common+0x8c/0xa8
[<ffffffff81314ecf>] ? rwsem_down_write_failed+0x1c/0x25
[<ffffffff81182733>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff81314894>] ? down_write+0x25/0x27
[<ffffffff8108cafe>] ? zap_pid_ns_processes+0xc6/0x23c
[<ffffffff81051ebd>] ? do_exit+0x357/0x758
[<ffffffff81052334>] ? do_group_exit+0x76/0x9d
[<ffffffff8105f288>] ? get_signal_to_deliver+0x2fd/0x329
[<ffffffff810100cd>] ? do_notify_resume+0xad/0x765
[<ffffffff810fe0a2>] ? poll_select_copy_remaining+0xd0/0xf3
[<ffffffff81010ede>] ? int_signal+0x12/0x17

stef1777 · Feb 8, 2011

Just to complete, unless the nfs volume is umounted manualy, the VM doesn't stop.

As well with this bug, "migrate" doesn't work too.

All this work fine with 2.6.18-4-pve

INFO: task vzctl:7221 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vzctl D ffff88051be0d800 0 7221 1 0x00000000
ffff88029dd1e000 0000000000000082 0000000000000000 0000000000000733
ffff88000000ed00 0000000000000041 000000000000fa40 ffff88051bf59fd8
0000000000016940 0000000000016940 ffff88051be0d800 ffff88051be0daf8
Call Trace:
[<ffffffff81073c32>] ? charge_dcache+0x61/0xb9
[<ffffffff81314e97>] ? rwsem_down_failed_common+0x8c/0xa8
[<ffffffff81101ee1>] ? __d_lookup+0xde/0x12e
[<ffffffff81314efa>] ? rwsem_down_read_failed+0x22/0x2b
[<ffffffff81182704>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff813148ad>] ? down_read+0x17/0x19
[<ffffffffa02d2f01>] ? do_env_enter+0x2d/0x157 [vzmon]
[<ffffffffa02d420d>] ? real_env_create+0xda5/0xdea [vzmon]
[<ffffffff810cdeef>] ? __do_fault+0x425/0x455
[<ffffffff810f4036>] ? chrdev_open+0x0/0x13e
[<ffffffff810efcf8>] ? __dentry_open+0x1aa/0x2a5
[<ffffffffa02d4485>] ? vzcalls_ioctl+0x233/0x501 [vzmon]
[<ffffffffa02cc1cd>] ? vzctl_ioctl+0x39/0x54 [vzdev]
[<ffffffff810fd25a>] ? vfs_ioctl+0x21/0x6c
[<ffffffff810fd7a8>] ? do_vfs_ioctl+0x48d/0x4cb
[<ffffffff810fd823>] ? sys_ioctl+0x3d/0x5c
[<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b

stef1777 · Feb 9, 2011

Good new!

Seems that the bug is corrected in Kernel RHEL6 042test006.1. This version have also the interesting "cpu limit" function back.

zzhjkrqlne · Feb 14, 2011

stef1777 said:
Good new!

Seems that the bug is corrected in Kernel RHEL6 042test006.1. This version have also the interesting "cpu limit" function back.

Great! Are Proxmox developers going to wait till these patches make their way into Debian's OpenVZ kernels, or update their kernels with these patches directly themselves?

http://bugzilla.openvz.org/show_bug.cgi?id=1626#c39
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613170#24

dietmar · Feb 15, 2011

zzhjkrqlne said:
Great! Are Proxmox developers going to wait till these patches make their way into Debian's OpenVZ kernels

yes

littletiger · Feb 25, 2011

About the patch to fix this NFS bug for OpenVZ on 2.6.32.

Does anyone knows what's this "missed the boat" means:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613170#59
How long do we have to wait for "the next" stable release ?

dietmar · Feb 25, 2011

littletiger said:
Does anyone knows what's this "missed the boat" means

Usually about 2 weeks.

littletiger · Feb 26, 2011

dietmar said:
Usually about 2 weeks.

Great News! Thanks very much for that.

That NFS bug is massive pain, I was hoping they did not mean months

zzhjkrqlne · Mar 8, 2011

dietmar said:
Usually about 2 weeks.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613170#69 . Just letting you know in case if you didn't notice it.

openvz server and NFS -> problem with "stop"

Active Member

Proxmox Staff Member

Active Member

Active Member

Proxmox Staff Member

Active Member

Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Active Member

Active Member

Active Member

Renowned Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Renowned Member

We value your privacy