[SOLVED] Testserver crash sporadically

fireon

Distinguished Member
Oct 25, 2010
4,520
489
153
Austria/Graz
deepdoc.at
Hello,

i have a little testserver here. I installed the no subscription repo, and after some weeks the testing repo. With every installation i had after some days, a crash.

Penitum Dualcore, 4GB Ram, 4 HDDs in RaidZ, no VMs present, only for storing backups from an PVEhost.

At the Log a had these last messages:

Code:
Apr 24 01:29:16 backup kernel: INFO: task kswapd0:32 blocked for more than 120 seconds.
Apr 24 01:29:16 backup kernel: Tainted: P O 4.4.6-1-pve #1
Apr 24 01:29:16 backup kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 01:29:16 backup kernel: kswapd0 D ffff88011596b808 0 32 2 0x00000000
Apr 24 01:29:16 backup kernel: ffff88011596b808 ffffffffc01906f6 ffff8800db034b00 ffff88011a57e400
Apr 24 01:29:16 backup kernel: ffff88011596c000 ffff880015d2ef64 ffff88011a57e400 00000000ffffffff
Apr 24 01:29:16 backup kernel: ffff880015d2ef68 ffff88011596b820 ffffffff81841c95 ffff880015d2ef60
Apr 24 01:29:16 backup kernel: Call Trace:
Apr 24 01:29:16 backup kernel: [<ffffffffc01906f6>] ? dmu_objset_userused_enabled+0x16/0x50 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffff81841c95>] schedule+0x35/0x80
Apr 24 01:29:16 backup kernel: [<ffffffff81841f4e>] schedule_preempt_disabled+0xe/0x10
Apr 24 01:29:16 backup kernel: [<ffffffff81843c59>] __mutex_lock_slowpath+0xb9/0x130
Apr 24 01:29:16 backup kernel: [<ffffffff81843cef>] mutex_lock+0x1f/0x30
Apr 24 01:29:16 backup kernel: [<ffffffffc0184ca0>] dbuf_read+0xf0/0x840 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01f0653>] ? zap_get_leaf_byblk+0x103/0x2c0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01868d9>] dmu_buf_will_dirty+0x49/0xa0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01eff2a>] zap_increment_num_entries+0x2a/0x90 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01f199a>] fzap_remove+0x9a/0xb0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01f6e31>] zap_remove_norm+0x141/0x170 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01f6e73>] zap_remove+0x13/0x20 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01f2044>] zap_remove_int+0x54/0x80 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc02006e2>] zfs_rmnode+0x1e2/0x350 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc021ddf2>] ? zfs_znode_hold_exit+0x102/0x130 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc0221898>] zfs_zinactive+0xd8/0xf0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc02196bb>] zfs_inactive+0x6b/0x260 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffff8119df8a>] ? truncate_pagecache+0x5a/0x70
Apr 24 01:29:16 backup kernel: [<ffffffffc0232558>] zpl_evict_inode+0x48/0x70 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffff81226e4f>] evict+0xbf/0x190
Apr 24 01:29:16 backup kernel: [<ffffffff81226f56>] dispose_list+0x36/0x50
Apr 24 01:29:16 backup kernel: [<ffffffff812282aa>] prune_icache_sb+0x5a/0x80
Apr 24 01:29:16 backup kernel: [<ffffffff8120ee84>] super_cache_scan+0x154/0x1a0
Apr 24 01:29:16 backup kernel: [<ffffffff8119f39d>] shrink_slab.part.40+0x1dd/0x3b0
Apr 24 01:29:16 backup kernel: [<ffffffff811a35c3>] shrink_zone+0x293/0x2d0
Apr 24 01:29:16 backup kernel: [<ffffffff811a4763>] kswapd+0x583/0xa40
Apr 24 01:29:16 backup kernel: [<ffffffff811a41e0>] ? mem_cgroup_shrink_node_zone+0x1c0/0x1c0
Apr 24 01:29:16 backup kernel: [<ffffffff8109fb3a>] kthread+0xea/0x100
Apr 24 01:29:16 backup kernel: [<ffffffff8109fa50>] ? kthread_park+0x60/0x60
Apr 24 01:29:16 backup kernel: [<ffffffff8184614f>] ret_from_fork+0x3f/0x70
Apr 24 01:29:16 backup kernel: [<ffffffff8109fa50>] ? kthread_park+0x60/0x60
Apr 24 01:29:16 backup kernel: INFO: task rpcbind:2279 blocked for more than 120 seconds.
Apr 24 01:29:16 backup kernel: Tainted: P O 4.4.6-1-pve #1
Apr 24 01:29:16 backup kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 01:29:16 backup kernel: rpcbind D ffff8800d02439f0 0 2279 1 0x00000000
Apr 24 01:29:16 backup kernel: ffff8800d02439f0 ffffffff813f2ed4 ffff8800a628be80 ffff8800da46cb00
Apr 24 01:29:16 backup kernel: ffff8800d0244000 ffff880099e23c18 ffff880099e23c40 ffff880099e23c70
Apr 24 01:29:16 backup kernel: 0000000000000000 ffff8800d0243a08 ffffffff81841c95 ffff880099e23c68
Apr 24 01:29:16 backup kernel: Call Trace:
Apr 24 01:29:16 backup kernel: [<ffffffff813f2ed4>] ? timerqueue_del+0x24/0x70
Apr 24 01:29:16 backup kernel: [<ffffffff81841c95>] schedule+0x35/0x80
Apr 24 01:29:16 backup kernel: [<ffffffffc006edcb>] cv_wait_common+0x10b/0x140 [spl]
Apr 24 01:29:16 backup kernel: [<ffffffff810c2cc0>] ? wait_woken+0x90/0x90
Apr 24 01:29:16 backup kernel: [<ffffffffc006ee15>] __cv_wait+0x15/0x20 [spl]
Apr 24 01:29:16 backup kernel: [<ffffffffc0185056>] dbuf_read+0x4a6/0x840 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01857fa>] ? __dbuf_hold_impl+0x22a/0x510 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc0185a4e>] __dbuf_hold_impl+0x47e/0x510 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc0185b52>] dbuf_hold_impl+0x72/0xa0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc0185e6f>] dbuf_hold+0x2f/0x60 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc018d739>] dmu_buf_hold_array_by_dnode+0x109/0x4b0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc018e11f>] dmu_read+0x9f/0x190 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc01c5202>] ? rrw_enter_read_impl+0xb2/0x170 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc02148bf>] zfs_getpage+0x11f/0x1f0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffffc02307ff>] zpl_readpage+0x5f/0xc0 [zfs]
Apr 24 01:29:16 backup kernel: [<ffffffff8118e539>] filemap_fault+0x229/0x3e0
Apr 24 01:29:16 backup kernel: [<ffffffff811bac10>] __do_fault+0x50/0xe0
Apr 24 01:29:16 backup kernel: [<ffffffff811bf720>] handle_mm_fault+0x1100/0x1a20
Apr 24 01:29:16 backup kernel: [<ffffffff81220060>] ? poll_select_copy_remaining+0x140/0x140
Apr 24 01:29:16 backup kernel: [<ffffffff8106a4ed>] __do_page_fault+0x19d/0x410
Apr 24 01:29:16 backup kernel: [<ffffffff8106a782>] do_page_fault+0x22/0x30
Apr 24 01:29:16 backup kernel: [<ffffffff81847f38>] page_fault+0x28/0x30
Apr 24 01:29:16 backup kernel: INFO: task nscd:2615 blocked for more than 120 seconds.
Apr 24 01:29:16 backup kernel: Tainted: P O 4.4.6-1-pve #1
Apr 24 01:29:16 backup kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 01:29:16 backup kernel: nscd D ffff8800ce43f9f0 0 2615 1 0x00000000
Apr 24 01:29:16 backup kernel: ffff8800ce43f9f0 0000000000000000 ffff8800db034b00 ffff8800cf36f080
Apr 24 01:29:16 backup kernel: ffff8800ce440000 ffff880099e23c18 ffff880099e23c40 ffff880099e23c70
Apr 24 01:29:16 backup kernel: 0000000000000000 ffff8800ce43fa08 ffffffff81841c95 ffff880099e23c68
Apr 24 01:29:16 backup kernel: Call Trace:

pve-manager/4.1-30/9e199213 (running kernel: 4.4.6-1-pve)

My first idea was that the machine is out of memory. But detected no swapping. From on moment to the other machine was death. It is possible that is an HW fault, because the very old HW. But what say the log exactly. It is possible the see someting interessting? What is with kswapd? I know yes 4GB is to scant for ZFS, but I thought maybe it works so.

Thanks.
 
ZFS under 8GB Ram make really aboslutily no sense. Best 16GB Ram up.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!