Hi there,
I've been having issue as of lately with my Proxmox installation (5.2-2) running on an Intel NUC.
The installation is fairly simple. It's a single node running a few VMs, among them is a FreeNAS instance. However, when I put it under heavy load, I get kernel errors like this:
Searching the forum, I found similar tickets with 'tainting' issues, which suggested updating the BIOS, which I did. That seems to have mitigated some of the issues, but I'm still getting them.
Did anyone see something similar? Are there any change to either the BIOS or Proxmox I can make to mitigate this?
Other observations:
* It seems to happen when the device is under I/O load. The FreeNAS devices has six USB drives attached
* The USB drives are attached to the VM using
I've been having issue as of lately with my Proxmox installation (5.2-2) running on an Intel NUC.
The installation is fairly simple. It's a single node running a few VMs, among them is a FreeNAS instance. However, when I put it under heavy load, I get kernel errors like this:
Code:
Jul 23 16:22:12 nuc kernel: [21518.680562] general protection fault: 0000 [#3] SMP PTI
Jul 23 16:22:12 nuc kernel: [21518.680569] Modules linked in: ip_set ip6table_filter ip6_tables iptable_filter softdog openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack nfnetlink_log nfnetlink dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel ses enclosure aes_x86_64 crypto_simd glue_helper cryptd scsi_transport_sas wmi_bmof intel_wmi_thunderbolt intel_cstate arc4 intel_rapl_perf iwlmvm mac80211 pcspkr iwlwifi rtsx_pci_ms memstick btusb btrtl btbcm cfg80211 btintel bluetooth ecdh_generic wmi ir_rc6_decoder rc_rc6_mce ir_lirc_codec
Jul 23 16:22:12 nuc kernel: [21518.680611] lirc_dev ite_cir rc_core snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi snd_soc_core snd_compress i915 ac97_bus snd_pcm_dmaengine snd_hda_intel snd_hda_codec video snd_hda_core snd_hwdep drm_kms_helper snd_pcm acpi_pad tpm_crb mac_hid drm i2c_algo_bit fb_sys_fops zfs(PO) snd_timer syscopyarea sysfillrect snd sysimgblt soundcore shpchp mei_me mei intel_pch_thermal zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq uas usb_storage rtsx_pci_sdmmc e1000e(O) ptp pps_core i2c_i801 rtsx_pci ahci libahci
Jul 23 16:22:12 nuc kernel: [21518.680656] CPU: 3 PID: 30545 Comm: kvm Tainted: P D O 4.15.18-1-pve #1
Jul 23 16:22:12 nuc kernel: [21518.680658] Hardware name: Intel Corporation NUC7i5BNH/NUC7i5BNB, BIOS BNKBL357.86A.0065.2018.0606.1639 06/06/2018
Jul 23 16:22:12 nuc kernel: [21518.680664] RIP: 0010:find_get_entry+0x42/0x100
Jul 23 16:22:12 nuc kernel: [21518.680666] RSP: 0018:ffffbc9284ca3cb0 EFLAGS: 00010246
Jul 23 16:22:12 nuc kernel: [21518.680668] RAX: 0000000000000000 RBX: ffff9fcf6aa28598 RCX: ffff9fcb0b56e6f0
Jul 23 16:22:12 nuc kernel: [21518.680670] RDX: 0400000000000000 RSI: 000000000c8e1ac0 RDI: ffff9fcb0b56e6f0
Jul 23 16:22:12 nuc kernel: [21518.680672] RBP: ffffbc9284ca3cc0 R08: ffff9fcb0b56e6c8 R09: ffffbc9284ca3c90
Jul 23 16:22:12 nuc kernel: [21518.680674] R10: 0000000000000040 R11: ffff9fcb0b56e6f0 R12: 000000000c8e1ac0
Jul 23 16:22:12 nuc kernel: [21518.680676] R13: ffff9fcf6aa28590 R14: 000000000c8e1ac0 R15: ffffe483c01dbf00
Jul 23 16:22:12 nuc kernel: [21518.680679] FS: 00007f35a99cf700(0000) GS:ffff9fcf7ed80000(0000) knlGS:0000000000000000
Jul 23 16:22:12 nuc kernel: [21518.680681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 23 16:22:12 nuc kernel: [21518.680683] CR2: 000000c420162360 CR3: 000000042d9fc005 CR4: 00000000003626e0
Jul 23 16:22:12 nuc kernel: [21518.680685] Call Trace:
Jul 23 16:22:12 nuc kernel: [21518.680689] pagecache_get_page+0x2c/0x2b0
Jul 23 16:22:12 nuc kernel: [21518.680692] generic_file_read_iter+0x284/0xbb0
Jul 23 16:22:12 nuc kernel: [21518.680695] ? page_cache_tree_insert+0xe0/0xe0
Jul 23 16:22:12 nuc kernel: [21518.680697] blkdev_read_iter+0x35/0x40
Jul 23 16:22:12 nuc kernel: [21518.680700] new_sync_read+0xe4/0x130
Jul 23 16:22:12 nuc kernel: [21518.680703] __vfs_read+0x29/0x40
Jul 23 16:22:12 nuc kernel: [21518.680705] vfs_read+0x96/0x130
Jul 23 16:22:12 nuc kernel: [21518.680708] SyS_pread64+0x95/0xb0
Jul 23 16:22:12 nuc kernel: [21518.680710] do_syscall_64+0x73/0x130
Jul 23 16:22:12 nuc kernel: [21518.680714] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jul 23 16:22:12 nuc kernel: [21518.680716] RIP: 0033:0x7f37d7123903
Jul 23 16:22:12 nuc kernel: [21518.680718] RSP: 002b:00007f35a99cc5d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
Jul 23 16:22:12 nuc kernel: [21518.680720] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f37d7123903
Jul 23 16:22:12 nuc kernel: [21518.680722] RDX: 0000000000020000 RSI: 00007f3697841000 RDI: 0000000000000018
Jul 23 16:22:12 nuc kernel: [21518.680724] RBP: 00007f35bde91f40 R08: 0000000000000000 R09: 00000000ffffffff
Jul 23 16:22:12 nuc kernel: [21518.680727] R10: 000000c8e1ab7000 R11: 0000000000000293 R12: 00007f3697841000
Jul 23 16:22:12 nuc kernel: [21518.680729] R13: 00007f37cb1a4258 R14: 0000000000000000 R15: 00007f37ef799040
Jul 23 16:22:12 nuc kernel: [21518.680731] Code: 89 df e8 82 81 7d 00 48 85 c0 48 89 c1 0f 84 ae 00 00 00 48 8b 10 48 85 d2 0f 84 a2 00 00 00 48 89 d0 83 e0 03 0f 85 aa 00 00 00 <48> 8b 42 20 48 8d 78 ff a8 01 48 0f 44 fa 8b 47 1c 85 c0 74 bc
Jul 23 16:22:12 nuc kernel: [21518.680757] RIP: find_get_entry+0x42/0x100 RSP: ffffbc9284ca3cb0
Jul 23 16:22:12 nuc kernel: [21518.680762] ---[ end trace 18d10346ac8c6788 ]---
Searching the forum, I found similar tickets with 'tainting' issues, which suggested updating the BIOS, which I did. That seems to have mitigated some of the issues, but I'm still getting them.
Did anyone see something similar? Are there any change to either the BIOS or Proxmox I can make to mitigate this?
Other observations:
* It seems to happen when the device is under I/O load. The FreeNAS devices has six USB drives attached
* The USB drives are attached to the VM using
Code:
qm set N -virtioX /dev/disk/by-id/...