Proxmox 6 / Debian 10 / 5.0.21-3-pve / Tracing kernel errors

Mar 27, 2017
23
3
6
My /var/log/kern.log is flooded by the kernel errors see below. Can anyone help me how should I investigate the RCOTI? I would like to figure out what is causing these traces.

Thanks!

Code:
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.641703] ------------[ cut here ]------------
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.641842] WARNING: CPU: 30 PID: 23233 at arch/x86/kvm/mmu.c:2102 nonpaging_update_pte+0x15/0x19 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.641957] Modules linked in: sctp cpuid tcp_diag inet_diag veth ebtable_filter ebtables ip_set 8021q garp mrp ip6table_filter ip6_tables xt_nat softdog iptable_nat nf_nat_ipv4 nf_nat nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 nfnetlink_log xt_comment nfnetlink iptable_filter bpfilter amd64_edac_mod edac_mce_amd kvm_amd zfs(PO) kvm irqbypass zunicode(PO) zlua(PO) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel crct10dif_pclmul snd_hda_codec crc32_pclmul ghash_clmulni_intel snd_hda_core snd_hwdep aesni_intel snd_pcm snd_timer snd aes_x86_64 crypto_simd soundcore cryptd glue_helper k10temp ccp input_leds serio_raw mxm_wmi mac_hid zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c uas
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.641999]  usb_storage ixgbe xfrm_algo mdio i2c_piix4 igb i2c_algo_bit ahci dca libahci gpio_amdpt wmi gpio_generic [last unloaded: cpuid]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.642665] CPU: 30 PID: 23233 Comm: kvm Tainted: P        W  O      5.0.21-3-pve #1
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.642780] Hardware name: Micro-Star International Co., Ltd. MS-7B09/X399 HZ, BIOS A.E4 01/21/2019
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.642917] RIP: 0010:nonpaging_update_pte+0x15/0x19 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.643005] Code: 75 e8 e8 40 7d 24 f8 0f 0b 48 8b 75 e8 8b 55 e4 e9 7b 6c fd ff 0f 1f 44 00 00 55 48 c7 c7 10 ad cc c0 48 89 e5 e8 1d 7d 24 f8 <0f> 0b 5d c3 48 c7 c7 10 ad cc c0 e8 0d 7d 24 f8 0f 0b 0f b6 43 24
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.643163] RSP: 0018:ffffac4fc8527a00 EFLAGS: 00010246
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.643249] RAX: 0000000000000024 RBX: 0000000000000701 RCX: 0000000000000000
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.644328] RDX: 0000000000000000 RSI: ffff9c217d796448 RDI: ffff9c217d796448
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.644438] RBP: ffffac4fc8527a00 R08: 0000000000000000 R09: 00000000009460a6
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.644546] R10: 000000000000072d R11: ffffac4fc8527868 R12: 0000000000000000
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.644654] R13: ffff9c02f53e0000 R14: ffffac4fc8527a48 R15: ffff9c111e0c6000
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.644763] FS:  00007f02ce9ff700(0000) GS:ffff9c217d780000(0000) knlGS:0000000000000000
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.644876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.644966] CR2: 0000000005fe4000 CR3: 00000010f174e000 CR4: 00000000003406e0
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645075] Call Trace:
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645178]  kvm_mmu_pte_write+0x442/0x450 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645285]  kvm_page_track_write+0x82/0xb0 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645391]  emulator_write_phys+0x3b/0x50 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645496]  write_emulate+0xe/0x10 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645597]  emulator_read_write_onepage+0xfc/0x320 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645703]  emulator_read_write+0xd6/0x190 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645809]  ? ioapic_service+0x11c/0x140 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.645912]  emulator_write_emulated+0x15/0x20 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646018]  segmented_write+0x5d/0x80 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646133]  writeback+0x161/0x2e0 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646237]  x86_emulate_insn+0x663/0x1180 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646342]  x86_emulate_instruction+0x347/0x740 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646445]  ? svm_vcpu_load+0xfb/0x150 [kvm_amd]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646550]  complete_emulated_pio+0x3f/0x70 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646655]  kvm_arch_vcpu_ioctl_run+0x16f5/0x1a10 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646760]  ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646849]  ? _copy_to_user+0x2b/0x40
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.646947]  ? kvm_vm_ioctl+0x842/0x9d0 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647047]  kvm_vcpu_ioctl+0x24b/0x610 [kvm]
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647133]  ? do_futex+0xc4/0xc50
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647224]  ? __switch_to_asm+0x41/0x70
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647308]  ? __switch_to_asm+0x35/0x70
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647391]  ? __switch_to_asm+0x41/0x70
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647474]  ? __switch_to_asm+0x35/0x70
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647557]  ? common_interrupt+0xa/0xf
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647642]  do_vfs_ioctl+0xa9/0x640
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647725]  ksys_ioctl+0x67/0x90
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647807]  __x64_sys_ioctl+0x1a/0x20
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647891]  do_syscall_64+0x5a/0x110
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.647974]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648060] RIP: 0033:0x7f02f083f427
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648142] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648301] RSP: 002b:00007f02ce9fa678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648411] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f02f083f427
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648520] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000025
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648628] RBP: 0000000000000000 R08: 0000561a78105650 R09: 0000000000000000
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648737] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f02e18475c0
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648845] R13: 0000561a780e16a0 R14: 00007f02e3ec5000 R15: 0000000000000000
Nov 25 16:12:34 hetzner-srv1 kernel: [797845.648954] ---[ end trace 02313c8c300f0cfb ]---
 
Is everything working as expected, aside from the log entries? What hardware is this running on (CPU, memory, etc...)? What software version (pveversion -v)?

Also, since this seems to be related to memory management/kvm, maybe a grep -R "" /sys/module/kvm_amd/parameters/ (assuming I correctly read your log and you're using an AMD CPU, 'kvm_intel' if not) could be useful.
 
Is everything working as expected, aside from the log entries? What hardware is this running on (CPU, memory, etc...)? What software version (pveversion -v)?

Also, since this seems to be related to memory management/kvm, maybe a grep -R "" /sys/module/kvm_amd/parameters/ (assuming I correctly read your log and you're using an AMD CPU, 'kvm_intel' if not) could be useful.
Hi Stefan,

well, from host perspective everything is working, but I haven't analyzed in detail the health of all VMs/LXCs as we have some tasks to delivered ASAP on the hosts and then I can dig deeper to the host system itself.

CPU: AMD Ryzen Threadripper 2950X

- snippet of one core from /proc/cpuinfo:

Code:
processor       : 31
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 8
model name      : AMD Ryzen Threadripper 2950X 16-Core Processor
stepping        : 2
microcode       : 0x800820b
cpu MHz         : 3949.537
cache size      : 512 KB
physical id     : 0
siblings        : 32
core id         : 7
cpu cores       : 16
apicid          : 31
initial apicid  : 31
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass                                                 
bogomips        : 6999.18
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]


Memory: 8x 16 GB DDR4 ECC

- snippet of one memory device from dmidecode -t 17 command:

Code:
Handle 0x002C, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x000F
        Error Information Handle: 0x002B
        Total Width: 128 bits
        Data Width: 64 bits
        Size: 16384 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM 1
        Bank Locator: P0 CHANNEL D
        Type: DDR4
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 2400 MT/s
        Manufacturer: Samsung
        Serial Number: 2571DD25
        Asset Tag: Not Specified
        Part Number: M391A2K43BB1-CTD
        Rank: 2
        Configured Memory Speed: 1200 MT/s
        Minimum Voltage: 1.2 V
        Maximum Voltage: 1.2 V
        Configured Voltage: 1.2 V


pveversion + AMD CPU parameter list:

Code:
$ pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-11 (running version: 6.0-11/2140ef37)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-3-pve: 5.0.21-7
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-7
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-8
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
$ grep -R "" /sys/module/kvm_amd/parameters/
/sys/module/kvm_amd/parameters/vls:1
/sys/module/kvm_amd/parameters/pause_filter_count_max:65535
/sys/module/kvm_amd/parameters/pause_filter_count_grow:2
/sys/module/kvm_amd/parameters/pause_filter_count_shrink:0
/sys/module/kvm_amd/parameters/npt:1
/sys/module/kvm_amd/parameters/sev:0
/sys/module/kvm_amd/parameters/vgif:1
/sys/module/kvm_amd/parameters/nested:1
/sys/module/kvm_amd/parameters/dump_invalid_vmcb:N
/sys/module/kvm_amd/parameters/pause_filter_thresh:128
/sys/module/kvm_amd/parameters/pause_filter_count:3000
/sys/module/kvm_amd/parameters/avic:0

Thanks for Your help indeed!
 
Code:
grep -R "" /sys/module/kvm_amd/parameters
/sys/module/kvm_amd/parameters/vls:1
/sys/module/kvm_amd/parameters/pause_filter_count_max:65535
/sys/module/kvm_amd/parameters/pause_filter_count_grow:2
/sys/module/kvm_amd/parameters/pause_filter_count_shrink:0
/sys/module/kvm_amd/parameters/npt:1
/sys/module/kvm_amd/parameters/sev:0
/sys/module/kvm_amd/parameters/vgif:1
/sys/module/kvm_amd/parameters/nested:1
/sys/module/kvm_amd/parameters/dump_invalid_vmcb:N
/sys/module/kvm_amd/parameters/pause_filter_thresh:128
/sys/module/kvm_amd/parameters/pause_filter_count:3000
/sys/module/kvm_amd/parameters/avic:0
I have a similar problem.


Code:
[136692.928283] WARNING: CPU: 10 PID: 16512 at arch/x86/kvm/mmu.c:2157 nonpaging_update_pte+0x15/0x19 [kvm]
[136692.928284] Modules linked in: tcp_diag inet_diag veth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filter ebtables ip_set ip6table_filter ip6_tables iptable_filter bpfilter 8021q garp mrp bonding softdog nfnetlink_log nfnetlink edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio kvm snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm crct10dif_pclmul crc32_pclmul snd_timer ghash_clmulni_intel aesni_intel snd soundcore aes_x86_64 eeepc_wmi crypto_simd asus_wmi sparse_keymap cryptd video ccp k10temp glue_helper mxm_wmi pcspkr wmi_bmof mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) btrfs xor zstd_compress raid6_pq libcrc32c i2c_piix4 megaraid_sas igb i2c_algo_bit dca ahci libahci gpio_amdpt wmi gpio_generic
[136692.928307] CPU: 10 PID: 16512 Comm: kvm Tainted: P        W  O      5.0.21-5-pve #1
[136692.928308] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5220 09/11/2019
[136692.928318] RIP: 0010:nonpaging_update_pte+0x15/0x19 [kvm]
[136692.928319] Code: 75 e8 e8 c8 3e ea e9 0f 0b 48 8b 75 e8 8b 55 e4 e9 c3 5d fd ff 0f 1f 44 00 00 55 48 c7 c7 e0 ee c6 c0 48 89 e5 e8 a5 3e ea e9 <0f> 0b 5d c3 48 c7 c7 e0 ee c6 c0 e8 95 3e ea e9 0f 0b 0f b6 43 34
[136692.928319] RSP: 0018:ffffbfbc866f3a08 EFLAGS: 00010246
[136692.928320] RAX: 0000000000000024 RBX: 0000000000000701 RCX: 0000000000000000
[136692.928320] RDX: 0000000000000000 RSI: ffff9be65e896448 RDI: ffff9be65e896448
[136692.928321] RBP: ffffbfbc866f3a08 R08: 0000000000000001 R09: 0000000000004458
[136692.928321] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000
[136692.928322] R13: ffff9be5990b0000 R14: ffffbfbc866f3a50 R15: ffff9bdcf7f04000
[136692.928322] FS:  00007fd6a2d7e700(0000) GS:ffff9be65e880000(0000) knlGS:0000000000000000
[136692.928323] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[136692.928323] CR2: 00007fa37c008028 CR3: 0000000f198fc000 CR4: 00000000003406e0
[136692.928324] Call Trace:
[136692.928335]  kvm_mmu_pte_write+0x442/0x450 [kvm]
[136692.928345]  kvm_page_track_write+0x82/0xb0 [kvm]
[136692.928354]  emulator_write_phys+0x3b/0x50 [kvm]
[136692.928363]  write_emulate+0xe/0x10 [kvm]
[136692.928372]  emulator_read_write_onepage+0xfc/0x320 [kvm]
[136692.928381]  emulator_read_write+0xd6/0x190 [kvm]
[136692.928390]  emulator_write_emulated+0x15/0x20 [kvm]
[136692.928399]  segmented_write+0x5d/0x80 [kvm]
[136692.928409]  writeback+0x161/0x2e0 [kvm]
[136692.928419]  x86_emulate_insn+0x663/0x1180 [kvm]
[136692.928421]  ? avic_vcpu_load+0x20/0x110 [kvm_amd]
[136692.928430]  x86_emulate_instruction+0x347/0x750 [kvm]
[136692.928439]  complete_emulated_pio+0x3f/0x70 [kvm]
[136692.928448]  kvm_arch_vcpu_ioctl_run+0x16f5/0x1a10 [kvm]
[136692.928457]  ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
[136692.928460]  ? _copy_to_user+0x2b/0x40
[136692.928468]  ? kvm_vm_ioctl+0x842/0x9d0 [kvm]
[136692.928469]  ? pollwake+0x72/0x90
[136692.928477]  kvm_vcpu_ioctl+0x24b/0x610 [kvm]
[136692.928479]  ? __wake_up_locked_key+0x1b/0x20
[136692.928480]  do_vfs_ioctl+0xa9/0x640
[136692.928481]  ksys_ioctl+0x67/0x90
[136692.928482]  __x64_sys_ioctl+0x1a/0x20
[136692.928484]  do_syscall_64+0x5a/0x110
[136692.928486]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[136692.928487] RIP: 0033:0x7fd6b08dc427
[136692.928488] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
[136692.928488] RSP: 002b:00007fd6a2d79678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[136692.928489] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fd6b08dc427
[136692.928489] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000018
[136692.928489] RBP: 0000000000000000 R08: 00005649b84ea650 R09: 0000000000000000
[136692.928490] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fd6a368dc80
[136692.928490] R13: 00005649b84c66a0 R14: 00007fd6b25fc000 R15: 0000000000000000
[136692.928491] ---[ end trace 27fc795ba04cf5de ]---


Code:
pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-4
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-8
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-11
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-9
pve-cluster: 6.0-9
pve-container: 3.0-13
pve-docs: 6.0-9
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-17
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2


AMD Ryzen 7 2700X Eight-Core Processor
 
Last edited:
could you try the new 5.3-based kernel ("apt install pve-kernel-5.3") and see whether the issue occurs there as well?
 
I also hit this on my Threadripper 1950X with nested virt once I started a guest that has guests of its own

Code:
[120208.836360] WARNING: CPU: 14 PID: 101770 at arch/x86/kvm/mmu.c:2194 nonpaging_update_pte+0x15/0x19 [kvm]
[120208.836361] Modules linked in: tcp_diag inet_diag vfio_pci vfio_virqfd vfio_iommu_type1 vfio ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog nfnetlink_log nfnetlink edac_mce_amd snd_hda_codec_hdmi kvm_amd kvm irqbypass iwlmvm zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nouveau video snd_hda_intel ttm aesni_intel snd_hda_codec drm_kms_helper aes_x86_64 snd_hda_core crypto_simd snd_hwdep cryptd glue_helper drm mac80211 snd_pcm snd_timer fb_sys_fops input_leds joydev libarc4 syscopyarea snd sysfillrect pcspkr sysimgblt soundcore wmi_bmof iwlwifi mxm_wmi k10temp ccp cfg80211 mac_hid bonding zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_generic usbkbd
[120208.836391]  usbmouse usbhid hid sfc mtd mdio igb i2c_piix4 i2c_algo_bit ahci dca libahci gpio_amdpt wmi gpio_generic
[120208.836398] CPU: 14 PID: 101770 Comm: kvm Tainted: P        W  O      5.3.10-1-pve #1
[120208.836399] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.80 08/27/2019
[120208.836411] RIP: 0010:nonpaging_update_pte+0x15/0x19 [kvm]
[120208.836412] Code: 75 e8 e8 4d dd 95 ef 0f 0b 48 8b 75 e8 8b 55 e4 e9 08 4d fd ff 0f 1f 44 00 00 55 48 c7 c7 c0 af 3b c1 48 89 e5 e8 2a dd 95 ef <0f> 0b 5d c3 48 c7 c7 c0 af 3b c1 e8 1a dd 95 ef 0f 0b 0f b6 43 34
[120208.836413] RSP: 0018:ffffbe9f80eafa68 EFLAGS: 00010246
[120208.836413] RAX: 0000000000000024 RBX: 0000000000000701 RCX: 0000000000000000
[120208.836414] RDX: 0000000000000000 RSI: ffff9c1b7d197448 RDI: ffff9c1b7d197448
[120208.836414] RBP: ffffbe9f80eafa68 R08: 0000000000001a42 R09: 0000000000000004
[120208.836415] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9c06d4133cd0
[120208.836415] R13: 0000000000000000 R14: ffff9c0391f1e000 R15: ffffbe9f80eafab8
[120208.836416] FS:  00007f3e1097e700(0000) GS:ffff9c1b7d180000(0000) knlGS:0000000000000000
[120208.836417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[120208.836417] CR2: 000056492c42a1f8 CR3: 0000000b2a9ba000 CR4: 00000000003406e0
[120208.836418] Call Trace:
[120208.836432]  kvm_mmu_pte_write+0x43e/0x450 [kvm]
[120208.836444]  kvm_page_track_write+0x82/0xc0 [kvm]
[120208.836454]  emulator_write_phys+0x3b/0x50 [kvm]
[120208.836465]  write_emulate+0xe/0x10 [kvm]
[120208.836475]  emulator_read_write_onepage+0xfc/0x320 [kvm]
[120208.836485]  emulator_read_write+0xd6/0x190 [kvm]
[120208.836495]  emulator_write_emulated+0x15/0x20 [kvm]
[120208.836506]  segmented_write+0x5d/0x80 [kvm]
[120208.836517]  writeback+0x161/0x2e0 [kvm]
[120208.836528]  x86_emulate_insn+0x663/0x1180 [kvm]
[120208.836531]  ? avic_vcpu_load+0x20/0x110 [kvm_amd]
[120208.836541]  x86_emulate_instruction+0x347/0x750 [kvm]
[120208.836551]  complete_emulated_pio+0x3f/0x70 [kvm]
[120208.836562]  kvm_arch_vcpu_ioctl_run+0x4d1/0x580 [kvm]
[120208.836570]  kvm_vcpu_ioctl+0x24b/0x610 [kvm]
[120208.836573]  ? __wake_up_locked_key+0x1b/0x20
[120208.836576]  do_vfs_ioctl+0xa9/0x640
[120208.836577]  ksys_ioctl+0x67/0x90
[120208.836578]  __x64_sys_ioctl+0x1a/0x20
[120208.836580]  do_syscall_64+0x5a/0x130
[120208.836583]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[120208.836584] RIP: 0033:0x7f3e1e484427
[120208.836585] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
[120208.836585] RSP: 002b:00007f3e10979678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[120208.836586] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f3e1e484427
[120208.836586] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019
[120208.836587] RBP: 0000000000000000 R08: 000055a1e6081cd0 R09: 0000000000000000
[120208.836587] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f3e112f3d80
[120208.836587] R13: 000055a1e6056960 R14: 00007f3e11c23000 R15: 0000000000000000
[120208.836589] ---[ end trace 3e2c2bae3207e541 ]---

Other reports:
https://bugzilla.redhat.com/show_bug.cgi?id=1674254
https://marc.info/?l=kvm&m=154830227629930
 
My /var/log/kern.log is flooded by the kernel errors see below. Can anyone help me how should I investigate the RCOTI? I would like to figure out what is causing these traces.

Thanks!

I had some issues too, and some strange behaviour (new VM corrupted or clones that could not boot or they crashed)... then I've found that the fan of the CPU was not working and so CPU overheated under a lot of situations, and inside the case the temperature was higher too (CPU fan, when working, directs the air to outside the case).
I've replaced the CPU fan and since then no more kernel problems... Try to check it too.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!