4.15 based test kernel for PVE 5.x available

Aug 7, 2018
15
0
1
119
Interesting enough everything seemed fine on our new(er) hardware until I started optimizing BIOS settings for our workload. Using HPE's document to optimize for low latency recommends disabling a few features that caused these kernel panics for me.

The relevant settings are:
  • Intel Virtualization Technology -> disabled
  • Intel Hyperthreading Options -> disabled
  • Intel Turbo Boost Technology -> disabled
  • Intel VT-d -> disabled
What's odd though is I have to keep them all enabled since pve-kernel 4.15.18 while 4.15.17 was fine with these disabled.

(We use 3 nodes just for storage with a Proxmox install to simplify the CEPH installation process, otherwise these settings obviously should be enabled except for Hyper Threading perhaps)

Perhaps someone else running into this issue might want to check these settings, I'll leave them enabled for now.

regards,
Menno
 

rotanid

New Member
Dec 18, 2012
11
0
1
the network crashes are back for us with kernel pve-kernel-4.15.18-2-pve / 4.15.18-20

Code:
Aug 29 15:19:28 vh6 kernel: [53879.772436] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Aug 29 15:19:28 vh6 kernel: [53879.772436]   TDH                  <0>
Aug 29 15:19:28 vh6 kernel: [53879.772436]   TDT                  <8>
Aug 29 15:19:28 vh6 kernel: [53879.772436]   next_to_use          <8>
Aug 29 15:19:28 vh6 kernel: [53879.772436]   next_to_clean        <0>
Aug 29 15:19:28 vh6 kernel: [53879.772436] buffer_info[next_to_clean]:
Aug 29 15:19:28 vh6 kernel: [53879.772436]   time_stamp           <100cc5f82>
Aug 29 15:19:28 vh6 kernel: [53879.772436]   next_to_watch        <0>
Aug 29 15:19:28 vh6 kernel: [53879.772436]   jiffies              <100cc6408>
Aug 29 15:19:28 vh6 kernel: [53879.772436]   next_to_watch.status <0>
Aug 29 15:19:28 vh6 kernel: [53879.772436] MAC Status             <80083>
Aug 29 15:19:28 vh6 kernel: [53879.772436] PHY Status             <796d>
Aug 29 15:19:28 vh6 kernel: [53879.772436] PHY 1000BASE-T Status  <7800>
Aug 29 15:19:28 vh6 kernel: [53879.772436] PHY Extended Status    <3000>
Aug 29 15:19:28 vh6 kernel: [53879.772436] PCI Status             <10>
Code:
# ethtool -e enp0s31f6 length 256
Offset        Values
------        ------
0x0000:        90 1b 0e da c4 eb 01 08 ff ff 84 00 8e 00 00 80
0x0010:        ff ff ff ff c3 10 1f 12 34 17 b7 15 00 00 00 00
0x0020:        00 00 00 00 00 80 05 a7 2c 30 00 16 00 00 00 0c
0x0030:        f4 18 02 0a 43 08 13 01 b7 15 ad ba b7 15 b8 15
0x0040:        ad ba b7 15 ad ba b7 15 00 00 80 80 00 4e 86 08
0x0050:        00 00 00 00 07 00 00 20 20 00 00 00 00 0e 00 00
0x0060:        00 01 00 40 0a 01 07 40 ff ff ff ff ff ff ff ff
0x0070:        ff ff ff ff ff ff ff ff ff ff 00 02 ff ff 2f 35
0x0080:        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0090:        00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff
0x00a0:        94 b0 00 08 0a 00 04 90 b0 47 40 24 c2 c1 21 fb
0x00b0:        80 60 1f 00 00 48 10 00 40 60 1f 00 04 d1 11 00
0x00c0:        03 0a 12 00 00 00 1f 00 04 b4 30 00 1c 00 31 00
0x00d0:        06 b4 30 00 09 00 31 00 07 b4 30 00 10 00 31 00
0x00e0:        0a b4 30 00 18 00 31 00 0c b4 30 00 18 00 31 00
0x00f0:        0d b4 30 00 18 00 31 00 01 fd 30 00 2c 9c 31 00
Code:
# ethtool -i enp0s31f6
driver: e1000e
version: 3.4.1.1-NAPI
firmware-version: 0.8-4
expansion-rom-version:
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,500
212
63
South Tyrol/Italy
What's odd though is I have to keep them all enabled since pve-kernel 4.15.18 while 4.15.17 was fine with these disabled.
That is very very strange, with all those virtualization flags disabled you should have been never able to boot any VM with us (at least if you do not set 'kvm' manually to off)..
Please never turn them off.

the network crashes are back for us with kernel pve-kernel-4.15.18-2-pve / 4.15.18-20
Meaning that it's always there, just a different (higher/lesser) likely chance to trigger it... Or even a HW/Chipset problem.

hmm your eeprom settings dump does not looks like you have a earlier problematic power feature enabled.

Have you tried disabling some offloading features? Often they're the cause of such hangs with specific chipsets:
Code:
ethtool -Kenp0s31f6 gso off gro off tso off
(may imply some performance degradation as now the CPU must do this work)
 

rotanid

New Member
Dec 18, 2012
11
0
1
we have 5 systems at the moment with this intel NIC and ProxmoxVE.
Meaning that it's always there, just a different (higher/lesser) likely chance to trigger it...
the system that crashed 2 days ago was running pve-kernel-4.13.16-2-pve until 3 days ago

Have you tried disabling some offloading features?
thanks, will try this!
 

sergopotap

New Member
Jun 28, 2018
3
0
1
28
My cluster work normally 4 days, after intremap=off

asocialpenguin.com/2013/12/23/interrupt-remapping-problems-with-intel-5500-5520-cpus/
 

robertb

New Member
Apr 4, 2017
16
0
1
26
Germany
Hello,
I also have the same problem with 2x Xeon e5 2680v2 on supermicro x9drw-if, latest bios and pve.
Crashes with the same message, additionally I have one of the 10g links (x520-da) flapping since.
However, removing the Intel-Microcode package seems to hhav stabilized it a bit.
 

pniebylski

New Member
Aug 10, 2018
1
1
3
35
Hello Proxmox Team.

Have this issue been resolved in the latest kernel update? I mean from what I can read here is that only Intel network cards have this problem.

This is a blocking issue for us as we cannot upgrade our environment until we get a green light that the kernel panic errors no longer happen when host machine have Intel NICs instaled.

Than you!
 
  • Like
Reactions: robertb
Sep 27, 2016
37
3
8
So we've just run into the same issue with a SuperMicro system as well, having Intel i210 NICs installed and Kernel pve-kernel-4.15.18-7-pve: 4.15.18-27 running.

As the hardware is provided by a managed service we're currently asking them to install the most recent NIC firmware, but any ideas from the Proxmox team on how to solve this in addition?

The system was running rock solid for about 10 months now with always the latest V4.4 installed, but in order to stay within support we've decided to upgrade to 5.x. :confused:

I'll try to post crashlogs as soon as we can access the OS again...
 
Sep 27, 2016
37
3
8
Unfortunately the logs have been not stored, so we currently don't have any further details :confused:

So we'll adjust the log retention and try to have more information available when it crashes the next time...
 

jehster

New Member
Jan 19, 2016
8
0
1
42
Hi,

We're facing the problem on one node of our cluster:
3xR610 + SolarFlare SFN5322F.

Code:
Kernel Version Linux 4.15.18-9-pve #1 SMP PVE 4.15.18-30 (Thu, 15 Nov 2018 13:32:46 +0100)
PVE Manager Version pve-manager/5.3-5/97ae681d
[ 2103.747850] ------------[ cut here ]------------
[ 2103.747869] NETDEV WATCHDOG: eth5 (sfc): transmit queue 9 timed out
[ 2103.747899] WARNING: CPU: 14 PID: 0 at net/sched/sch_generic.c:323 dev_watchdog+0x222/0x230
[ 2103.747901] Modules linked in: xt_set ip_set_hash_net nfsv3 nfs_acl nfs lockd grace fscache ip_set ip6table_filter ip6_tables xt_multiport iptable_filter bonding softdog nfnetlink_log nfnetlink intel_p
werclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel ttm pcbc drm_kms_helper drm i2c_algo_bit snd_pcm fb_sys_fops syscopyarea aesni_intel aes_x86_6
sysfillrect crypto_simd snd_timer sysimgblt glue_helper snd cryptd soundcore shpchp joydev input_leds gpio_ich acpi_power_meter ipmi_si dcdbas ipmi_devintf ipmi_msghandler wmi serio_raw intel_cstate lpc_
ch pcspkr i7core_edac ioatdma dca mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm sunrpc ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 hid_generic
[ 2103.747983] usbmouse usbkbd sfc mtd ptp usbhid psmouse pps_core pata_acpi hid megaraid_sas mdio bnx2
[ 2103.747998] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G I 4.15.18-9-pve #1
[ 2103.747999] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013
[ 2103.748001] RIP: 0010:dev_watchdog+0x222/0x230
[ 2103.748002] RSP: 0018:ffff908bcf3c3e58 EFLAGS: 00010286
[ 2103.748003] RAX: 0000000000000000 RBX: 0000000000000009 RCX: 0000000000000000
[ 2103.748004] RDX: 0000000000040400 RSI: 00000000000000f6 RDI: 0000000000000300
[ 2103.748005] RBP: ffff908bcf3c3e88 R08: 0000000000000001 R09: 0000000000000400
[ 2103.748006] R10: ffff908bcf3da770 R11: 0000000000000400 R12: 0000000000000040
[ 2103.748007] R13: ffff9083cbd82000 R14: ffff9083cbd82478 R15: ffff9083c6b5cf40
[ 2103.748008] FS: 0000000000000000(0000) GS:ffff908bcf3c0000(0000) knlGS:0000000000000000
[ 2103.748009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2103.748010] CR2: 00007ff3c5bd8fd0 CR3: 0000000eafe0a005 CR4: 00000000000226e0
[ 2103.748011] Call Trace:
[ 2103.748012] <IRQ>
[ 2103.748016] ? dev_deactivate_queue.constprop.33+0x60/0x60
[ 2103.748019] call_timer_fn+0x32/0x130
[ 2103.748021] run_timer_softirq+0x1dd/0x430
[ 2103.748024] ? tick_sched_handle+0x34/0x60
[ 2103.748026] ? ktime_get+0x43/0xa0
[ 2103.748028] __do_softirq+0x10c/0x2a2
[ 2103.748031] irq_exit+0xb8/0xc0
[ 2103.748033] smp_apic_timer_interrupt+0x79/0x130
[ 2103.748035] apic_timer_interrupt+0x84/0x90
[ 2103.748036] </IRQ>
[ 2103.748038] RIP: 0010:cpuidle_enter_state+0xa5/0x2e0
[ 2103.748039] RSP: 0018:ffffb93586313e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
[ 2103.748041] RAX: ffff908bcf3e28c0 RBX: 0000000000000003 RCX: 000000000000001f
[ 2103.748042] RDX: 000001e9d1247af1 RSI: ffffffd9314b6f1d RDI: 0000000000000000
[ 2103.748043] RBP: ffffb93586313e90 R08: 0000000000000ea6 R09: 0000000000000006
[ 2103.748044] R10: ffffb93586313e28 R11: 0000000000000135 R12: ffff908bcf3ec900
[ 2103.748045] R13: ffffffff9f771cb8 R14: 000001e9d1247af1 R15: ffffffff9f771ca0
[ 2103.748047] ? cpuidle_enter_state+0x97/0x2e0
[ 2103.748049] cpuidle_enter+0x17/0x20
[ 2103.748051] call_cpuidle+0x23/0x40
[ 2103.748053] do_idle+0x19a/0x200
[ 2103.748055] cpu_startup_entry+0x73/0x80
[ 2103.748057] start_secondary+0x1ab/0x200
[ 2103.748060] secondary_startup_64+0xa5/0xb0
[ 2103.748061] Code: 36 00 49 63 4e e8 eb 92 4c 89 ef c6 05 d8 ca d7 00 01 e8 32 1f fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 40 78 39 9f e8 4e 71 7f ff <0f> 0b eb c0 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 6
90 55 48
[ 2103.748091] ---[ end trace 9222b51587ffbe79 ]---
[ 2103.748097] sfc 0000:04:00.1 eth5: TX stuck with port_enabled=1: resetting channels
[ 2103.748206] sfc 0000:04:00.1 eth5: resetting (RECOVER_OR_ALL)
[ 2103.825671] sfc 0000:04:00.1 eth5: link down
[ 2103.991537] sfc 0000:04:00.1 eth5: link up at 10000Mbps full-duplex (MTU 1500)
[ 2296.761180] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 2296.761223] Tainted: G W I 4.15.18-9-pve #1
[ 2296.761252] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2296.761273] lzop D 0 8382 8357 0x00000000
[ 2296.761275] Call Trace:
[ 2296.761281] __schedule+0x3e0/0x870
[ 2296.761283] ? bit_wait+0x60/0x60
[ 2296.761284] schedule+0x36/0x80
[ 2296.761286] io_schedule+0x16/0x40
[ 2296.761287] bit_wait_io+0x11/0x60
[ 2296.761288] __wait_on_bit+0x5a/0x90
[ 2296.761289] out_of_line_wait_on_bit+0x8e/0xb0
[ 2296.761291] ? bit_waitqueue+0x40/0x40
[ 2296.761305] nfs_wait_on_request+0x46/0x50 [nfs]
[ 2296.761311] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 2296.761313] ? radix_tree_lookup_slot+0x22/0x50
[ 2296.761320] nfs_updatepage+0x151/0x910 [nfs]
[ 2296.761325] nfs_write_end+0x129/0x4e0 [nfs]
[ 2296.761327] generic_perform_write+0xff/0x1b0
[ 2296.761333] nfs_file_write+0xd7/0x250 [nfs]
[ 2296.761346] new_sync_write+0xe7/0x140
[ 2296.761348] __vfs_write+0x29/0x40
[ 2296.761349] vfs_write+0xb5/0x1a0
[ 2296.761351] SyS_write+0x55/0xc0
[ 2296.761353] do_syscall_64+0x73/0x130
[ 2296.761355] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 2296.761356] RIP: 0033:0x7f5411d99730
[ 2296.761357] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2296.761358] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 2296.761359] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 2296.761360] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 2296.761360] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 2296.761361] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 2417.586366] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 2417.586392] Tainted: G W I 4.15.18-9-pve #1
[ 2417.586408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2417.586429] lzop D 0 8382 8357 0x00000000
[ 2417.586431] Call Trace:
[ 2417.586437] __schedule+0x3e0/0x870
[ 2417.586439] ? bit_wait+0x60/0x60
[ 2417.586440] schedule+0x36/0x80
[ 2417.586442] io_schedule+0x16/0x40
[ 2417.586443] bit_wait_io+0x11/0x60
[ 2417.586443] __wait_on_bit+0x5a/0x90
[ 2417.586445] out_of_line_wait_on_bit+0x8e/0xb0
[ 2417.586447] ? bit_waitqueue+0x40/0x40
[ 2417.586461] nfs_wait_on_request+0x46/0x50 [nfs]
[ 2417.586467] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 2417.586469] ? radix_tree_lookup_slot+0x22/0x50
[ 2417.586475] nfs_updatepage+0x151/0x910 [nfs]
[ 2417.586480] nfs_write_end+0x129/0x4e0 [nfs]
[ 2417.586483] generic_perform_write+0xff/0x1b0
[ 2417.586488] nfs_file_write+0xd7/0x250 [nfs]
[ 2417.586490] new_sync_write+0xe7/0x140
[ 2417.586491] __vfs_write+0x29/0x40
[ 2417.586493] vfs_write+0xb5/0x1a0
[ 2417.586494] SyS_write+0x55/0xc0
[ 2417.586496] do_syscall_64+0x73/0x130
[ 2417.586497] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 2417.586499] RIP: 0033:0x7f5411d99730
[ 2417.586499] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2417.586501] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 2417.586501] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 2417.586502] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 2417.586503] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 2417.586504] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 2538.411658] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 2538.411702] Tainted: G W I 4.15.18-9-pve #1
[ 2538.411734] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2538.411778] lzop D 0 8382 8357 0x00000000
[ 2538.411780] Call Trace:
[ 2538.411787] __schedule+0x3e0/0x870
[ 2538.411789] ? bit_wait+0x60/0x60
[ 2538.411790] schedule+0x36/0x80
[ 2538.411791] io_schedule+0x16/0x40
[ 2538.411792] bit_wait_io+0x11/0x60
[ 2538.411793] __wait_on_bit+0x5a/0x90
[ 2538.411795] out_of_line_wait_on_bit+0x8e/0xb0
[ 2538.411797] ? bit_waitqueue+0x40/0x40
[ 2538.411810] nfs_wait_on_request+0x46/0x50 [nfs]
[ 2538.411816] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 2538.411818] ? radix_tree_lookup_slot+0x22/0x50
[ 2538.411824] nfs_updatepage+0x151/0x910 [nfs]
[ 2538.411830] nfs_write_end+0x129/0x4e0 [nfs]
[ 2538.411832] generic_perform_write+0xff/0x1b0
[ 2538.411837] nfs_file_write+0xd7/0x250 [nfs]
[ 2538.411839] new_sync_write+0xe7/0x140
[ 2538.411841] __vfs_write+0x29/0x40
[ 2538.411842] vfs_write+0xb5/0x1a0
[ 2538.411843] SyS_write+0x55/0xc0
[ 2538.411845] do_syscall_64+0x73/0x130
[ 2538.411847] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 2538.411848] RIP: 0033:0x7f5411d99730
[ 2538.411849] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2538.411850] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 2538.411851] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 2538.411851] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 2538.411852] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 2538.411853] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 2659.236894] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 2659.236936] Tainted: G W I 4.15.18-9-pve #1
[ 2659.236968] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2659.237003] lzop D 0 8382 8357 0x00000000
[ 2659.237004] Call Trace:
[ 2659.237010] __schedule+0x3e0/0x870
[ 2659.237012] ? bit_wait+0x60/0x60
[ 2659.237013] schedule+0x36/0x80
[ 2659.237015] io_schedule+0x16/0x40
[ 2659.237016] bit_wait_io+0x11/0x60
[ 2659.237017] __wait_on_bit+0x5a/0x90
[ 2659.237018] out_of_line_wait_on_bit+0x8e/0xb0
[ 2659.237020] ? bit_waitqueue+0x40/0x40
[ 2659.237034] nfs_wait_on_request+0x46/0x50 [nfs]
[ 2659.237040] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 2659.237043] ? radix_tree_lookup_slot+0x22/0x50
[ 2659.237049] nfs_updatepage+0x151/0x910 [nfs]
[ 2659.237055] nfs_write_end+0x129/0x4e0 [nfs]
[ 2659.237057] generic_perform_write+0xff/0x1b0
[ 2659.237062] nfs_file_write+0xd7/0x250 [nfs]
[ 2659.237064] new_sync_write+0xe7/0x140
[ 2659.237066] __vfs_write+0x29/0x40
[ 2659.237067] vfs_write+0xb5/0x1a0
[ 2659.237068] SyS_write+0x55/0xc0
[ 2659.237070] do_syscall_64+0x73/0x130
[ 2659.237072] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 2659.237073] RIP: 0033:0x7f5411d99730
[ 2659.237074] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2659.237075] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 2659.237076] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 2659.237076] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 2659.237077] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 2659.237078] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 2671.524157] nfs: server 10.9.80.101 not responding, still trying
[ 2780.062084] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 2780.062126] Tainted: G W I 4.15.18-9-pve #1
[ 2780.062160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2780.062182] lzop D 0 8382 8357 0x00000000
[ 2780.062184] Call Trace:
[ 2780.062190] __schedule+0x3e0/0x870
[ 2780.062192] ? bit_wait+0x60/0x60
[ 2780.062193] schedule+0x36/0x80
[ 2780.062195] io_schedule+0x16/0x40
[ 2780.062196] bit_wait_io+0x11/0x60
[ 2780.062197] __wait_on_bit+0x5a/0x90
[ 2780.062198] out_of_line_wait_on_bit+0x8e/0xb0
[ 2780.062200] ? bit_waitqueue+0x40/0x40
[ 2780.062214] nfs_wait_on_request+0x46/0x50 [nfs]
[ 2780.062221] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 2780.062223] ? radix_tree_lookup_slot+0x22/0x50
[ 2780.062229] nfs_updatepage+0x151/0x910 [nfs]
[ 2780.062235] nfs_write_end+0x129/0x4e0 [nfs]
[ 2780.062237] generic_perform_write+0xff/0x1b0
[ 2780.062243] nfs_file_write+0xd7/0x250 [nfs]
[ 2780.062244] new_sync_write+0xe7/0x140
[ 2780.062246] __vfs_write+0x29/0x40
[ 2780.062247] vfs_write+0xb5/0x1a0
[ 2780.062249] SyS_write+0x55/0xc0
[ 2780.062250] do_syscall_64+0x73/0x130
[ 2780.062252] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 2780.062253] RIP: 0033:0x7f5411d99730
[ 2780.062254] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2780.062255] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 2780.062256] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 2780.062257] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 2780.062257] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 2780.062258] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 2900.887397] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 2900.887438] Tainted: G W I 4.15.18-9-pve #1
[ 2900.887464] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2900.887485] lzop D 0 8382 8357 0x00000000
[ 2900.887487] Call Trace:
[ 2900.887493] __schedule+0x3e0/0x870
[ 2900.887494] ? bit_wait+0x60/0x60
[ 2900.887495] schedule+0x36/0x80
[ 2900.887497] io_schedule+0x16/0x40
[ 2900.887498] bit_wait_io+0x11/0x60
[ 2900.887499] __wait_on_bit+0x5a/0x90
[ 2900.887500] out_of_line_wait_on_bit+0x8e/0xb0
[ 2900.887502] ? bit_waitqueue+0x40/0x40
[ 2900.887516] nfs_wait_on_request+0x46/0x50 [nfs]
[ 2900.887522] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 2900.887524] ? radix_tree_lookup_slot+0x22/0x50
[ 2900.887530] nfs_updatepage+0x151/0x910 [nfs]
[ 2900.887535] nfs_write_end+0x129/0x4e0 [nfs]
[ 2900.887537] generic_perform_write+0xff/0x1b0
[ 2900.887543] nfs_file_write+0xd7/0x250 [nfs]
[ 2900.887545] new_sync_write+0xe7/0x140
[ 2900.887546] __vfs_write+0x29/0x40
[ 2900.887547] vfs_write+0xb5/0x1a0
[ 2900.887549] SyS_write+0x55/0xc0
[ 2900.887551] do_syscall_64+0x73/0x130
[ 2900.887552] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 2900.887554] RIP: 0033:0x7f5411d99730
[ 2900.887554] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2900.887556] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 2900.887556] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 2900.887557] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 2900.887558] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 2900.887558] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 3021.712639] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 3021.712682] Tainted: G W I 4.15.18-9-pve #1
[ 3021.712702] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3021.712723] lzop D 0 8382 8357 0x00000000
[ 3021.712728] Call Trace:
[ 3021.712736] __schedule+0x3e0/0x870
[ 3021.712739] ? bit_wait+0x60/0x60
[ 3021.712741] schedule+0x36/0x80
[ 3021.712745] io_schedule+0x16/0x40
[ 3021.712747] bit_wait_io+0x11/0x60
[ 3021.712749] __wait_on_bit+0x5a/0x90
[ 3021.712751] out_of_line_wait_on_bit+0x8e/0xb0
[ 3021.712755] ? bit_waitqueue+0x40/0x40
[ 3021.712774] nfs_wait_on_request+0x46/0x50 [nfs]
[ 3021.712780] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 3021.712782] ? radix_tree_lookup_slot+0x22/0x50
[ 3021.712789] nfs_updatepage+0x151/0x910 [nfs]
[ 3021.712794] nfs_write_end+0x129/0x4e0 [nfs]
[ 3021.712796] generic_perform_write+0xff/0x1b0
[ 3021.712802] nfs_file_write+0xd7/0x250 [nfs]
[ 3021.712804] new_sync_write+0xe7/0x140
[ 3021.712805] __vfs_write+0x29/0x40
[ 3021.712806] vfs_write+0xb5/0x1a0
[ 3021.712808] SyS_write+0x55/0xc0
[ 3021.712809] do_syscall_64+0x73/0x130
[ 3021.712811] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3021.712812] RIP: 0033:0x7f5411d99730
[ 3021.712813] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3021.712814] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 3021.712815] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 3021.712816] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 3021.712817] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 3021.712817] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 3142.537959] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 3142.537993] Tainted: G W I 4.15.18-9-pve #1
[ 3142.538009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3142.538030] lzop D 0 8382 8357 0x00000000
[ 3142.538032] Call Trace:
[ 3142.538038] __schedule+0x3e0/0x870
[ 3142.538040] ? bit_wait+0x60/0x60
[ 3142.538041] schedule+0x36/0x80
[ 3142.538043] io_schedule+0x16/0x40
[ 3142.538044] bit_wait_io+0x11/0x60
[ 3142.538045] __wait_on_bit+0x5a/0x90
[ 3142.538046] out_of_line_wait_on_bit+0x8e/0xb0
[ 3142.538048] ? bit_waitqueue+0x40/0x40
[ 3142.538061] nfs_wait_on_request+0x46/0x50 [nfs]
[ 3142.538067] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 3142.538070] ? radix_tree_lookup_slot+0x22/0x50
[ 3142.538076] nfs_updatepage+0x151/0x910 [nfs]
[ 3142.538082] nfs_write_end+0x129/0x4e0 [nfs]
[ 3142.538084] generic_perform_write+0xff/0x1b0
[ 3142.538089] nfs_file_write+0xd7/0x250 [nfs]
[ 3142.538091] new_sync_write+0xe7/0x140
[ 3142.538093] __vfs_write+0x29/0x40
[ 3142.538094] vfs_write+0xb5/0x1a0
[ 3142.538095] SyS_write+0x55/0xc0
[ 3142.538097] do_syscall_64+0x73/0x130
[ 3142.538098] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3142.538100] RIP: 0033:0x7f5411d99730
[ 3142.538101] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3142.538102] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 3142.538103] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 3142.538103] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 3142.538104] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 3142.538105] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 3200.320822] device tap100i0 entered promiscuous mode
[ 3200.331590] vmbr0: port 3(tap100i0) entered blocking state
[ 3200.331592] vmbr0: port 3(tap100i0) entered disabled state
[ 3200.331699] vmbr0: port 3(tap100i0) entered blocking state
[ 3200.331701] vmbr0: port 3(tap100i0) entered forwarding state
[ 3222.152389] device tap101i0 entered promiscuous mode
[ 3222.161898] vmbr0: port 10(tap101i0) entered blocking state
[ 3222.161900] vmbr0: port 10(tap101i0) entered disabled state
[ 3222.161995] vmbr0: port 10(tap101i0) entered blocking state
[ 3222.161997] vmbr0: port 10(tap101i0) entered forwarding state
[ 3234.692797] nfs: server 10.9.80.101 not responding, still trying
[ 3244.546872] device tap106i0 entered promiscuous mode
[ 3244.555741] vmbr0: port 11(tap106i0) entered blocking state
[ 3244.555743] vmbr0: port 11(tap106i0) entered disabled state
[ 3244.555835] vmbr0: port 11(tap106i0) entered blocking state
[ 3244.555837] vmbr0: port 11(tap106i0) entered forwarding state
[ 3263.363288] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 3263.363310] Tainted: G W I 4.15.18-9-pve #1
[ 3263.363326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3263.363347] lzop D 0 8382 8357 0x00000000
[ 3263.363349] Call Trace:
[ 3263.363355] __schedule+0x3e0/0x870
[ 3263.363357] ? bit_wait+0x60/0x60
[ 3263.363358] schedule+0x36/0x80
[ 3263.363360] io_schedule+0x16/0x40
[ 3263.363361] bit_wait_io+0x11/0x60
[ 3263.363362] __wait_on_bit+0x5a/0x90
[ 3263.363363] out_of_line_wait_on_bit+0x8e/0xb0
[ 3263.363365] ? bit_waitqueue+0x40/0x40
[ 3263.363379] nfs_wait_on_request+0x46/0x50 [nfs]
[ 3263.363386] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 3263.363388] ? radix_tree_lookup_slot+0x22/0x50
[ 3263.363394] nfs_updatepage+0x151/0x910 [nfs]
[ 3263.363399] nfs_write_end+0x129/0x4e0 [nfs]
[ 3263.363401] generic_perform_write+0xff/0x1b0
[ 3263.363407] nfs_file_write+0xd7/0x250 [nfs]
[ 3263.363409] new_sync_write+0xe7/0x140
[ 3263.363411] __vfs_write+0x29/0x40
[ 3263.363412] vfs_write+0xb5/0x1a0
[ 3263.363413] SyS_write+0x55/0xc0
[ 3263.363415] do_syscall_64+0x73/0x130
[ 3263.363417] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3263.363418] RIP: 0033:0x7f5411d99730
[ 3263.363419] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3263.363420] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 3263.363421] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 3263.363421] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 3263.363422] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 3263.363423] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0
[ 3331.954972] device tap113i0 entered promiscuous mode
[ 3331.964983] vmbr0: port 12(tap113i0) entered blocking state
[ 3331.964985] vmbr0: port 12(tap113i0) entered disabled state
[ 3331.965074] vmbr0: port 12(tap113i0) entered blocking state
[ 3331.965076] vmbr0: port 12(tap113i0) entered forwarding state
[ 3384.188538] INFO: task lzop:8382 blocked for more than 120 seconds.
[ 3384.188566] Tainted: G W I 4.15.18-9-pve #1
[ 3384.188582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3384.188603] lzop D 0 8382 8357 0x00000000
[ 3384.188605] Call Trace:
[ 3384.188611] __schedule+0x3e0/0x870
[ 3384.188613] ? bit_wait+0x60/0x60
[ 3384.188614] schedule+0x36/0x80
[ 3384.188616] io_schedule+0x16/0x40
[ 3384.188617] bit_wait_io+0x11/0x60
[ 3384.188618] __wait_on_bit+0x5a/0x90
[ 3384.188619] out_of_line_wait_on_bit+0x8e/0xb0
[ 3384.188621] ? bit_waitqueue+0x40/0x40
[ 3384.188639] nfs_wait_on_request+0x46/0x50 [nfs]
[ 3384.188646] nfs_lock_and_join_requests+0x121/0x510 [nfs]
[ 3384.188648] ? radix_tree_lookup_slot+0x22/0x50
[ 3384.188654] nfs_updatepage+0x151/0x910 [nfs]
[ 3384.188659] nfs_write_end+0x129/0x4e0 [nfs]
[ 3384.188661] generic_perform_write+0xff/0x1b0
[ 3384.188667] nfs_file_write+0xd7/0x250 [nfs]
[ 3384.188669] new_sync_write+0xe7/0x140
[ 3384.188670] __vfs_write+0x29/0x40
[ 3384.188672] vfs_write+0xb5/0x1a0
[ 3384.188673] SyS_write+0x55/0xc0
[ 3384.188675] do_syscall_64+0x73/0x130
[ 3384.188677] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3384.188678] RIP: 0033:0x7f5411d99730
[ 3384.188679] RSP: 002b:00007ffd36bc2438 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3384.188680] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5411d99730
[ 3384.188681] RDX: 0000000000000004 RSI: 00007ffd36bc24a0 RDI: 0000000000000001
[ 3384.188681] RBP: 0000000000000004 R08: fffffffffffffff5 R09: 00007f54123e9000
[ 3384.188682] R10: 00000000000001f9 R11: 0000000000000246 R12: 00007f5412492698
[ 3384.188683] R13: 0000000000000001 R14: 0000000000000019 R15: 00007ffd36bc24a0

All 3 nodes are up to date. No Ceph, no ZFS. Feel free to ask if more informations needed

have a good day
Jerome
 

coppola_f

Member
Apr 2, 2012
55
2
8
Italy
guys,
we've setup two more nodes....
these two are identycal DL680 gen8.

done fresh install using downloaded .iso image,
operated all updates (on these units we've an active subscription!)

then set units as cluster members ad after some "blank" runtime days,
moved some VMs to the new nodes (these are the newer nodes then we've moved there the most expensive VMs!)

all seems to run fine for many ours (i think one or two full days at max!)
then users started to report extremely slow response from the terminal server and really long query response time from MySQL linux VM!

after some rapid checks,
we've moved VMs back to the older nodes (still running 4.13.x kernel!)

then magically issue seems to have solved, normal response time from both VMs....

actually rolled back units to 4.13.x kernel,
then restarted units and after some testing time, moved VMs back to newer nodes!

now all running fine!
our cluster updates have been locked to 4.13.x kernel branch!!

waiting suggestions or any other information may come from you all!

best regards,

Francesco
 

cvhideki

New Member
Jan 19, 2016
5
0
1
33
Hi everyone!
i have a problem with lan interface e100 on my pfsense

I too run opnsense in Prox. I have the virtual nics set as E1000. I also find that during high traffic scenarios such as Torrents skype call video streaming, downloading the WAN gateway will just drop. It then take a
Code:
ifconfig em1 down
ifconfig em1 up
and my network interface it's working again


Code:
ethtool -i eno1
driver: e1000e
version: 3.4.1.1-NAPI
firmware-version: 0.4-4
expansion-rom-version:
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
Debian 9 kernel linux
Code:
Linux 4.15.18-20-pve x86_64
pveversion
Code:
pve-manager/5.4-13/aee6f0ec (running kernel: 4.15.18-20-pve)
 
Last edited:

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,500
212
63
South Tyrol/Italy
Hi. This is quite an old Thread, started well over a year ago. At many problems reported are not valid anymore. Maybe it's best to start a new thread with your specific issue.

You could try to disable some offloading, but that may reduce general performance a bit:
Code:
ethtool -K eth0 gso off gro off tso off
but could be worth a test
 
  • Like
Reactions: cvhideki

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!