nfs-kernel-server crash report

Frank.Pawlik

Member
Oct 21, 2009
47
0
6
nfs-kernel-server crash report [unresolved] -> kernel crash

Due to the fact that ocfs2 is no more in the RH-Kernel I followed this wiki:
http://pve.proxmox.com/wiki/Intel_Modular_Server
and installed the nfs-kernel-server.
Once a week the nfsd crashed during backup leaving an almost unusable system:

Oct 15 02:16:38 node1 kernel: nfsd D ffff88036062ec20 0 3249 2 0x00000000
Oct 15 02:16:38 node1 kernel: [<ffffffffa0315fe2>] ? nfsd_setuser_and_check_port+0x82/0xa0 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa031738c>] ? nfsd_permission+0xcc/0x170 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa0317ec5>] nfsd_vfs_write+0xe5/0x460 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa0318676>] ? nfsd_open+0x136/0x200 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa0318afa>] nfsd_write+0xea/0x100 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa0322804>] nfsd3_proc_write+0xb4/0x150 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa03124b3>] nfsd_dispatch+0xc3/0x260 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa0312cd0>] ? nfsd+0x0/0x190 [nfsd]
Oct 15 02:16:38 node1 kernel: [<ffffffffa0312dc5>] nfsd+0xf5/0x190 [nfsd]
Oct 15 02:16:38 node1 kernel: INFO: task nfsd:3250 blocked for more than 120 seconds.
etc.

Now I installed unfs3, a user mode nfs server.
Works good so far.
Maybe others have this problem too.

pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
 
Last edited:
Re: nfs-kernel-server crash report [unresolved]

Last night the backup (snapshot) killed the kernel!

Oct 18 02:16:57 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:16:57 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:17:28 node1 kernel: khugepaged D ffff88036aaee380 0 101
2 0x00000000
Oct 18 02:17:28 node1 kernel: ffff88036aaf5760 0000000000000046 0000000000000000
00000001811389c3
Oct 18 02:17:28 node1 kernel: 0000000000000000 000000000000f788 ffff88036aaf5fd8
ffff88036aaee380
Oct 18 02:17:28 node1 kernel: ffff88036e76cce0 ffff88036aaee948 ffff88036fc0c580
ffff88036aaee948
Oct 18 02:17:28 node1 kernel: Call Trace:
Oct 18 02:17:28 node1 kernel: [<ffffffff81131c95>] ? free_pcppages_bulk+0x2b5/0x
360
Oct 18 02:17:28 node1 kernel: [<ffffffff81131c95>] ? free_pcppages_bulk+0x2b5/0x
360
Oct 18 02:17:28 node1 kernel: [<ffffffff81012ba6>] ? read_tsc+0x16/0x40
Oct 18 02:17:28 node1 kernel: [<ffffffff814f3132>] io_schedule+0xb2/0x120
Oct 18 02:17:28 node1 kernel: [<ffffffffa02a66fe>] nfs_wait_bit_uninterruptible+
0xe/0x20 [nfs]
Oct 18 02:17:28 node1 kernel: [<ffffffff814f39c2>] __wait_on_bit+0x62/0x90
Oct 18 02:17:28 node1 kernel: [<ffffffffa02a66f0>] ? nfs_wait_bit_uninterruptibl
e+0x0/0x20 [nfs]
Oct 18 02:17:28 node1 kernel: [<ffffffffa02a66f0>] ? nfs_wait_bit_uninterruptibl
e+0x0/0x20 [nfs]
Oct 18 02:17:28 node1 kernel: [<ffffffff814f3a69>] out_of_line_wait_on_bit+0x79/
0x90
Oct 18 02:17:28 node1 kernel: [<ffffffff810934d0>] ? wake_bit_function+0x0/0x50
Oct 18 02:17:28 node1 kernel: [<ffffffff8115950c>] ? try_to_unmap_file+0x7c/0x81
0
Oct 18 02:17:28 node1 kernel: [<ffffffffa02a66df>] nfs_wait_on_request+0x2f/0x40
[nfs]
Oct 18 02:17:28 node1 kernel: [<ffffffffa02aaa78>] nfs_find_and_lock_request+0xa
8/0xd0 [nfs]
Oct 18 02:17:28 node1 kernel: [<ffffffffa02aaae1>] nfs_migrate_page+0x41/0x100 [
nfs]
Oct 18 02:17:28 node1 kernel: [<ffffffff81178de3>] move_to_new_page+0xa3/0x1d0
Oct 18 02:17:28 node1 kernel: [<ffffffff81179683>] migrate_pages+0x483/0x4d0
Oct 18 02:17:28 node1 kernel: [<ffffffff8116ee00>] ? compaction_alloc+0x0/0x390
Oct 18 02:17:28 node1 kernel: [<ffffffff8116e673>] compact_zone+0x463/0x7a0
Oct 18 02:17:28 node1 kernel: [<ffffffff8116ed3a>] try_to_compact_pages+0xca/0x1
90
Oct 18 02:17:28 node1 kernel: [<ffffffff81133106>] __alloc_pages_nodemask+0x776/
0xac0
Oct 18 02:17:28 node1 kernel: [<ffffffff8116ac7b>] alloc_pages_vma+0x9b/0x160
Oct 18 02:17:28 node1 kernel: [<ffffffff8117fcef>] khugepaged+0x9cf/0x1160
Oct 18 02:17:28 node1 kernel: [<ffffffff81093490>] ? autoremove_wake_function+0x
0/0x40
Oct 18 02:17:28 node1 kernel: [<ffffffff8117f320>] ? khugepaged+0x0/0x1160
Oct 18 02:17:28 node1 kernel: [<ffffffff81092e66>] kthread+0x96/0xb0
Oct 18 02:17:28 node1 kernel: [<ffffffff8100c38a>] child_rip+0xa/0x20
Oct 18 02:17:28 node1 kernel: [<ffffffff81092dd0>] ? kthread+0x0/0xb0
Oct 18 02:17:28 node1 kernel: [<ffffffff8100c380>] ? child_rip+0x0/0x20
Oct 18 02:19:15 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:19:15 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:19:28 node1 kernel: khugepaged D ffff88036aaee380 0 101
2 0x00000000
Oct 18 02:19:28 node1 kernel: ffff88036aaf5760 0000000000000046 0000000000000000
00000001811389c3
Oct 18 02:19:28 node1 kernel: 0000000000000000 000000000000f788 ffff88036aaf5fd8
Oct 18 02:19:28 node1 kernel: ffff88036e76cce0 ffff88036aaee948 ffff88036fc0c580
ffff88036aaee948
Oct 18 02:19:28 node1 kernel: Call Trace:
Oct 18 02:19:28 node1 kernel: [<ffffffff81131c95>] ? free_pcppages_bulk+0x2b5/0x
360
Oct 18 02:19:28 node1 kernel: [<ffffffff81131c95>] ? free_pcppages_bulk+0x2b5/0x
360
Oct 18 02:19:28 node1 kernel: [<ffffffff81012ba6>] ? read_tsc+0x16/0x40
Oct 18 02:19:28 node1 kernel: [<ffffffff814f3132>] io_schedule+0xb2/0x120
Oct 18 02:19:28 node1 kernel: [<ffffffffa02a66fe>] nfs_wait_bit_uninterruptible+
0xe/0x20 [nfs]
Oct 18 02:19:28 node1 kernel: [<ffffffff814f39c2>] __wait_on_bit+0x62/0x90
Oct 18 02:19:28 node1 kernel: [<ffffffffa02a66f0>] ? nfs_wait_bit_uninterruptibl
e+0x0/0x20 [nfs]
Oct 18 02:19:28 node1 kernel: [<ffffffffa02a66f0>] ? nfs_wait_bit_uninterruptibl
e+0x0/0x20 [nfs]
Oct 18 02:19:28 node1 kernel: [<ffffffff814f3a69>] out_of_line_wait_on_bit+0x79/
0x90
Oct 18 02:19:28 node1 kernel: [<ffffffff810934d0>] ? wake_bit_function+0x0/0x50
Oct 18 02:19:28 node1 kernel: [<ffffffff8115950c>] ? try_to_unmap_file+0x7c/0x81
0
Oct 18 02:19:28 node1 kernel: [<ffffffffa02a66df>] nfs_wait_on_request+0x2f/0x40
Oct 18 02:19:28 node1 kernel: [<ffffffffa02aaa78>] nfs_find_and_lock_request+0xa
8/0xd0 [nfs]
Oct 18 02:19:28 node1 kernel: [<ffffffffa02aaae1>] nfs_migrate_page+0x41/0x100 [
nfs]
Oct 18 02:19:28 node1 kernel: [<ffffffff81178de3>] move_to_new_page+0xa3/0x1d0
Oct 18 02:19:28 node1 kernel: [<ffffffff81179683>] migrate_pages+0x483/0x4d0
Oct 18 02:19:28 node1 kernel: [<ffffffff8116ee00>] ? compaction_alloc+0x0/0x390
Oct 18 02:19:28 node1 kernel: [<ffffffff8116e673>] compact_zone+0x463/0x7a0
Oct 18 02:19:28 node1 kernel: [<ffffffff8116ec39>] compact_zone_order+0xa9/0xe0
Oct 18 02:19:28 node1 kernel: [<ffffffff8116ed3a>] try_to_compact_pages+0xca/0x1
90
Oct 18 02:19:28 node1 kernel: [<ffffffff81133106>] __alloc_pages_nodemask+0x776/
0xac0
Oct 18 02:19:28 node1 kernel: [<ffffffff8116ac7b>] alloc_pages_vma+0x9b/0x160
Oct 18 02:19:28 node1 kernel: [<ffffffff8117fcef>] khugepaged+0x9cf/0x1160
Oct 18 02:19:28 node1 kernel: [<ffffffff81093490>] ? autoremove_wake_function+0x
0/0x40
Oct 18 02:19:28 node1 kernel: [<ffffffff8117f320>] ? khugepaged+0x0/0x1160
Oct 18 02:19:28 node1 kernel: [<ffffffff81092e66>] kthread+0x96/0xb0
Oct 18 02:19:28 node1 kernel: [<ffffffff8100c38a>] child_rip+0xa/0x20
Oct 18 02:19:28 node1 kernel: [<ffffffff81092dd0>] ? kthread+0x0/0xb0
Oct 18 02:19:28 node1 kernel: [<ffffffff8100c380>] ? child_rip+0x0/0x20
Oct 18 02:21:28 node1 kernel: khugepaged D ffff88036aaee380 0 101
2 0x00000000
Oct 18 02:21:28 node1 kernel: ffff88036aaf5760 0000000000000046 0000000000000000
00000001811389c3
Oct 18 02:21:28 node1 kernel: 0000000000000000 000000000000f788 ffff88036aaf5fd8
ffff88036aaee380
Oct 18 02:21:28 node1 kernel: ffff88036e76cce0 ffff88036aaee948 ffff88036fc0c580
ffff88036aaee948
Oct 18 02:21:28 node1 kernel: Call Trace:
Oct 18 02:21:28 node1 kernel: [<ffffffff81131c95>] ? free_pcppages_bulk+0x2b5/0x
360
Oct 18 02:21:28 node1 kernel: [<ffffffff81131c95>] ? free_pcppages_bulk+0x2b5/0x
360
Oct 18 02:21:28 node1 kernel: [<ffffffff81012ba6>] ? read_tsc+0x16/0x40
Oct 18 02:21:28 node1 kernel: [<ffffffff814f3132>] io_schedule+0xb2/0x120
Oct 18 02:21:28 node1 kernel: [<ffffffffa02a66fe>] nfs_wait_bit_uninterruptible+
0xe/0x20 [nfs]
Oct 18 02:21:28 node1 kernel: [<ffffffff814f39c2>] __wait_on_bit+0x62/0x90
Oct 18 02:21:28 node1 kernel: [<ffffffffa02a66f0>] ? nfs_wait_bit_uninterruptibl
e+0x0/0x20 [nfs]
Oct 18 02:21:28 node1 kernel: [<ffffffffa02a66f0>] ? nfs_wait_bit_uninterruptibl
e+0x0/0x20 [nfs]
Oct 18 02:21:28 node1 kernel: [<ffffffff814f3a69>] out_of_line_wait_on_bit+0x79/
Oct 18 02:21:28 node1 kernel: [<ffffffff810934d0>] ? wake_bit_function+0x0/0x50
Oct 18 02:21:28 node1 kernel: [<ffffffff8115950c>] ? try_to_unmap_file+0x7c/0x81
0
Oct 18 02:21:28 node1 kernel: [<ffffffffa02a66df>] nfs_wait_on_request+0x2f/0x40
[nfs]
Oct 18 02:21:28 node1 kernel: [<ffffffffa02aaa78>] nfs_find_and_lock_request+0xa
8/0xd0 [nfs]
Oct 18 02:21:28 node1 kernel: [<ffffffffa02aaae1>] nfs_migrate_page+0x41/0x100 [
nfs]
Oct 18 02:21:28 node1 kernel: [<ffffffff81178de3>] move_to_new_page+0xa3/0x1d0
Oct 18 02:21:28 node1 kernel: [<ffffffff81179683>] migrate_pages+0x483/0x4d0
Oct 18 02:21:28 node1 kernel: [<ffffffff8116ee00>] ? compaction_alloc+0x0/0x390
Oct 18 02:21:28 node1 kernel: [<ffffffff8116e673>] compact_zone+0x463/0x7a0
Oct 18 02:21:28 node1 kernel: [<ffffffff8116ec39>] compact_zone_order+0xa9/0xe0
Oct 18 02:21:28 node1 kernel: [<ffffffff8116ed3a>] try_to_compact_pages+0xca/0x1
90
Oct 18 02:21:28 node1 kernel: [<ffffffff81133106>] __alloc_pages_nodemask+0x776/
0xac0
Oct 18 02:21:28 node1 kernel: [<ffffffff8116ac7b>] alloc_pages_vma+0x9b/0x160
Oct 18 02:21:28 node1 kernel: [<ffffffff8117fcef>] khugepaged+0x9cf/0x1160
Oct 18 02:21:28 node1 kernel: [<ffffffff81093490>] ? autoremove_wake_function+0x
0/0x40
Oct 18 02:21:28 node1 kernel: [<ffffffff8117f320>] ? khugepaged+0x0/0x1160
Oct 18 02:21:28 node1 kernel: [<ffffffff81092e66>] kthread+0x96/0xb0
Oct 18 02:21:28 node1 kernel: [<ffffffff8100c38a>] child_rip+0xa/0x20
Oct 18 02:21:28 node1 kernel: [<ffffffff81092dd0>] ? kthread+0x0/0xb0
Oct 18 02:21:28 node1 kernel: [<ffffffff8100c380>] ? child_rip+0x0/0x20
Oct 18 02:21:33 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:21:33 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:21:33 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:21:33 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:21:33 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
Oct 18 02:21:33 node1 kernel: ct0 nfs: server 192.148.150.20 not responding, sti
ll trying
etc.

So, it's not the nfs-server which causes problems. It's the new RH-kernel.
I checked all filesystems and there are no errors on it.
I will try limiting backup bandwith, but that is not a solution.
The backup drive is RAID 1E with brand new 1 TB SAS disks.
 
Re: nfs-kernel-server crash report [unresolved]

Could you try with pve-kernel-2.6.32-4-pve: 2.6.32-33 ?

Kernel 2.6.32-4-pve prior to -31 had problem with NFS (problem of start/stop in openvz container, nfs mount not working on station, errors during data copy). -33 corrected the problems.
 
Re: nfs-kernel-server crash report [unresolved]

I use the debian now. Problem with backup too, but kernel did not crash.
 
Re: nfs-kernel-server crash report [unresolved]

I think that the free space for the snapshots is not big enough sometimes.
So I resized /dev/mapper/pve-data:
umount /var/lib/vz
e2fsck -f /dev/mapper/pve-data
resize2fs /dev/mapper/pve-data 35G (it was 42 GB)
lvreduce -L-7G /dev/mapper/pve-data
mount /var/lib/vz

Now I can have 'size: 4096' in /etc/vzdump.conf
I think backup behaves much better now.