openvz server and NFS -> problem with "stop"

stef1777

Active Member
Jan 31, 2010
178
8
38
Hi!

I have an Proxmox VE 1.6 server with one openvz host. A remote volume is mounted on the VE using NFS client. When I "stop" the VE, nothing append.

ps show the process but nothing append and I can't kill it.

root 10081 1 0 13:56 ? 00:00:00 /usr/sbin/vzctl stop 108

I have to reboot the promox server to go back to normal. If I again mount the nfs volume and do stop, I go back to the previous problem. Without nfs volume mounted, no problem with stop.

Any idea?
 
Nothing special in INIT log but I made more test. See below.

I have 2 proxmox VE servers : one master, one slave. The VE with NFS is on the slave server.
The Proxmox servers are the last 1.6.

pve-manager: 1.6-5 (pve-manager/1.6/5261)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.6-25
pve-kernel-2.6.32-3-pve: 2.6.32-14
pve-kernel-2.6.32-4-pve: 2.6.32-25
qemu-server: 1.1-22
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-14
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-8
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.5-2
ksm-control-daemon: 1.0-4

The VE is a Debian Lenny 32bits with only Apache and Darwing Streaming Server. The VE have the FEATURES="nfs:on ". I mount one remote volume using NFS.

Seems that I have 2 differents problems:

- if on the master with the GUI, I stop the VE, the master lost connection. I found "proxwww[13960]: 500 read timeout" in syslog on the master. I have this problem with and without NFS activated. I recreated the cluster clearing all configs, destroyed the VE and mounted a copy with vzrestore but I still have the problem. The vue is not refreshed in the GUI and got Unable to load cluster table.

- on the slave in shell, I can stop/start the VE if not NFS.

- If I activate NFS on VE with nfs:on, I have 2 cases:

-- NFS remote volume not mounted: I can stop/start. VE seems to work fine.

-- NFS remote volume mounted: I can't stop. I got timeout to the vzctl stop command and found this in syslog on the slave

Oct 31 10:16:58 cerimes-vs002 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 31 10:16:58 cerimes-vs002 kernel: vzctl D ffff8802abc1a000 0 6302 1 0x00000000
Oct 31 10:16:58 cerimes-vs002 kernel: ffff8802ad8d8000 0000000000000082 0000000000000000 0000000000012ca0
Oct 31 10:16:58 cerimes-vs002 kernel: 0000000000000030 0000000000000002 000000000000fa40 ffff88029fa83fd8
Oct 31 10:16:58 cerimes-vs002 kernel: 0000000000016940 0000000000016940 ffff8802abc1a000 ffff8802abc1a2f8
Oct 31 10:16:58 cerimes-vs002 kernel: Call Trace:
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81314587>] ? rwsem_down_failed_common+0x8c/0xa8
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff813145ea>] ? rwsem_down_read_failed+0x22/0x2b
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81182244>] ? call_rwsem_down_read_failed+0x14/0x30
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81313f9d>] ? down_read+0x17/0x19
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa0313f01>] ? do_env_enter+0x2d/0x157 [vzmon]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa031520d>] ? real_env_create+0xda5/0xdea [vzmon]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810cdc6f>] ? __do_fault+0x425/0x455
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810f3d82>] ? chrdev_open+0x0/0x13e
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810efa44>] ? __dentry_open+0x1aa/0x2a5
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa0315485>] ? vzcalls_ioctl+0x233/0x501 [vzmon]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffffa030d1cd>] ? vzctl_ioctl+0x39/0x54 [vzdev]
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810fcf06>] ? vfs_ioctl+0x21/0x6c
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810fd454>] ? do_vfs_ioctl+0x48d/0x4cb
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff810fd4cf>] ? sys_ioctl+0x3d/0x5c
Oct 31 10:16:58 cerimes-vs002 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b

Another problem: The first time I try to mount the NFS volume on the VE, the mount command respond "mount.nfs: No such device". I need to type "modprobe nfs" on the proxmox slave server. After, I can mount the NFS remote volume on the VE.

And the last: If the NFS is mounted on the VE, I should type 2 times reboot command on the slave to reboot it. The first reboot do nothing.

No fine. ;-)
 
I've done again all NFS test using a promox node not part of cluster.

I used openvz template "debian-5.0-standard_5.0-2_i386" for the test.

Same problem with NFS. I got an "Unable to stop container: operation timed out" if a NFS volume is mounted.

And I need to type "modprobe nfs" before.
 
Last edited:
Hello!

Sorry I was busy.

Using 2.6.18, the NFS mount/umount work fine. We can stop the VM with NFS volume mounted without problem. Automount def in fstab is well mounted at start.

But still a problem: we need to type "modprobe nfs" the first time on the proxmox server.

I also found that when we switch between 2.6.32 and 2.6.18 we lost the EN device on the PowerEdge 1950. eth0 is lost. No more network. Need to clear device or use eth2.
 
Last edited:
Please can you try with kernel 2.6.18?

Hi, I've been having the identical issue as the original poster on varying versions of Proxmox, including the latest 1.7 with the 2.6.32-4-pve kernel.

# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.18-4-pve: 2.6.18-10
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

If I use 2.6.18-4-pve as suggested, it has no such issues. Since I'd much prefer to use 2.6.32-4-pve (and still keep NFS clients on some VEs), is it possible to narrow this issue down to something more specific, so I can follow up with the right people? Is this issue related to Proxmox itself, or the Debian kernel it uses, or the OpenVZ itself, or combination of all those?

Thanks.
 
is it possible to narrow this issue down to something more specific, so I can follow up with the right people? Is this issue related to Proxmox itself, or the Debian kernel it uses, or the OpenVZ itself,

Seem to be OpenVZ related. Can you can reproduce the bug reliable? If so, please report a bug to the OpenVZ bug tracker.
 
Seem to be OpenVZ related. Can you can reproduce the bug reliable? If so, please report a bug to the OpenVZ bug tracker.

Thanks for the advice. I was able to reproduce this issue on a number of Proxmox installations so far. I'll try a bit more and go to OpenVZ forum/bug tracker, if it comes to that. I'll report back here if I find anything interesting.
 
Thanks for the advice. I was able to reproduce this issue on a number of Proxmox installations so far. I'll try a bit more and go to OpenVZ forum/bug tracker, if it comes to that. I'll report back here if I find anything interesting.

Looks like it is indeed OpenVZ + 2.6.32 kernel. I've been able to reproduce this on different combination of distributions on HN and VE. There was already a bug report on this at http://bugzilla.openvz.org/show_bug.cgi?id=1626 , but not fixed yet.
 
Hello,

Any news from this bug?

I installed a new server yesterday and used it to check if the problem is still here. -> YES.

Tested with uptodate Proxmox VE 1.7 with kernel 2.6.32-4-pve.
VM is Debian amd64 Lenny uptodate.

If you manualy unmount the nfs volume before doing stop, all is fine.


INFO: task init:9223 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
init D ffff88051ba1a000 0 9223 1 0x00000000
ffff88029dc8c000 0000000000000046 0000000000000000 0000000000000179
ffff88028523e800 0000000000000000 000000000000fa40 ffff88051afa1fd8
0000000000016940 0000000000016940 ffff88051ba1a000 ffff88051ba1a2f8
Call Trace:
[<ffffffff81051908>] ? do_wait+0x1f9/0x237
[<ffffffff81314e97>] ? rwsem_down_failed_common+0x8c/0xa8
[<ffffffff81314ecf>] ? rwsem_down_write_failed+0x1c/0x25
[<ffffffff81182733>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff81314894>] ? down_write+0x25/0x27
[<ffffffff8108cafe>] ? zap_pid_ns_processes+0xc6/0x23c
[<ffffffff81051ebd>] ? do_exit+0x357/0x758
[<ffffffff81052334>] ? do_group_exit+0x76/0x9d
[<ffffffff8105f288>] ? get_signal_to_deliver+0x2fd/0x329
[<ffffffff810100cd>] ? do_notify_resume+0xad/0x765
[<ffffffff810fe0a2>] ? poll_select_copy_remaining+0xd0/0xf3
[<ffffffff81010ede>] ? int_signal+0x12/0x17
 
Just to complete, unless the nfs volume is umounted manualy, the VM doesn't stop.

As well with this bug, "migrate" doesn't work too.

All this work fine with 2.6.18-4-pve

INFO: task vzctl:7221 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vzctl D ffff88051be0d800 0 7221 1 0x00000000
ffff88029dd1e000 0000000000000082 0000000000000000 0000000000000733
ffff88000000ed00 0000000000000041 000000000000fa40 ffff88051bf59fd8
0000000000016940 0000000000016940 ffff88051be0d800 ffff88051be0daf8
Call Trace:
[<ffffffff81073c32>] ? charge_dcache+0x61/0xb9
[<ffffffff81314e97>] ? rwsem_down_failed_common+0x8c/0xa8
[<ffffffff81101ee1>] ? __d_lookup+0xde/0x12e
[<ffffffff81314efa>] ? rwsem_down_read_failed+0x22/0x2b
[<ffffffff81182704>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff813148ad>] ? down_read+0x17/0x19
[<ffffffffa02d2f01>] ? do_env_enter+0x2d/0x157 [vzmon]
[<ffffffffa02d420d>] ? real_env_create+0xda5/0xdea [vzmon]
[<ffffffff810cdeef>] ? __do_fault+0x425/0x455
[<ffffffff810f4036>] ? chrdev_open+0x0/0x13e
[<ffffffff810efcf8>] ? __dentry_open+0x1aa/0x2a5
[<ffffffffa02d4485>] ? vzcalls_ioctl+0x233/0x501 [vzmon]
[<ffffffffa02cc1cd>] ? vzctl_ioctl+0x39/0x54 [vzdev]
[<ffffffff810fd25a>] ? vfs_ioctl+0x21/0x6c
[<ffffffff810fd7a8>] ? do_vfs_ioctl+0x48d/0x4cb
[<ffffffff810fd823>] ? sys_ioctl+0x3d/0x5c
[<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
 
Good new!

Seems that the bug is corrected in Kernel RHEL6 042test006.1. This version have also the interesting "cpu limit" function back.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!