kernel panic: BUG: unable to handle page fault for address: 0000000000008000

dyadyaMisha

Member
Mar 25, 2016
3
0
21
39
Siberia
recently installed a new server and ran into a kernel panic, can anyone tell me where to look for the reason?

Linux pve02 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100) x86_64 GNU/Linux

Code:
Dec 20 19:07:35 pve02 kernel: [35190.115810] BUG: unable to handle page fault for address: 0000000000008000
Dec 20 19:07:35 pve02 kernel: [35190.116350] #PF: supervisor read access in kernel mode
Dec 20 19:07:35 pve02 kernel: [35190.116817] #PF: error_code(0x0000) - not-present page
Dec 20 19:07:35 pve02 kernel: [35190.117266] PGD 0 P4D 0
Dec 20 19:07:35 pve02 kernel: [35190.117698] Oops: 0000 [#1] SMP PTI
Dec 20 19:07:35 pve02 kernel: [35190.118130] CPU: 2 PID: 31438 Comm: sshd Tainted: P           O      5.4.78-2-pve #1
Dec 20 19:07:35 pve02 kernel: [35190.118546] Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 0602 09/13/2011
Dec 20 19:07:35 pve02 kernel: [35190.118983] RIP: 0010:skb_release_data+0xa9/0x180
Dec 20 19:07:35 pve02 kernel: [35190.119419] Code: 48 0f 45 fa 66 66 66 66 90 f0 ff 4f 34 75 ce e8 4d 3f 94 ff 41 0f b6 45 02 48 83 c3 01 39 d8 7f c9 49 8b 7d 08 48 85 ff 74 10 <48> 8b 1f e8 2f f5 ff ff 48 89 df 48 85 db 75 f0 4d 85 e4 74 57 41
Dec 20 19:07:35 pve02 kernel: [35190.120412] RSP: 0018:ffffa8d5c0c1fc30 EFLAGS: 00010206
Dec 20 19:07:35 pve02 kernel: [35190.120922] RAX: 0000000000000020 RBX: 0000000000000000 RCX: ffffffff875f3a00
Dec 20 19:07:35 pve02 kernel: [35190.121458] RDX: 000000000001dae4 RSI: 00000008316f5861 RDI: 0000000000008000
Dec 20 19:07:35 pve02 kernel: [35190.121984] RBP: ffffa8d5c0c1fc48 R08: 00000000000005a8 R09: ffffffff866f05b0
Dec 20 19:07:35 pve02 kernel: [35190.122515] R10: ffff89e42c4923d0 R11: 0000000000000000 R12: ffff89e418731b00
Dec 20 19:07:35 pve02 kernel: [35190.123049] R13: ffff89e40570ca40 R14: ffff89e42c49287c R15: 0000000000000000
Dec 20 19:07:35 pve02 kernel: [35190.123607] FS:  00007fef8c92fe40(0000) GS:ffff89e4bb880000(0000) knlGS:0000000000000000
Dec 20 19:07:35 pve02 kernel: [35190.124177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 20 19:07:35 pve02 kernel: [35190.124751] CR2: 0000000000008000 CR3: 000000009357e004 CR4: 00000000000626e0
Dec 20 19:07:35 pve02 kernel: [35190.125332] Call Trace:
Dec 20 19:07:35 pve02 kernel: [35190.125937]  skb_release_all+0x24/0x30
Dec 20 19:07:35 pve02 kernel: [35190.126527]  __kfree_skb+0x12/0x20
Dec 20 19:07:35 pve02 kernel: [35190.127115]  tcp_recvmsg+0x7b5/0xbd0
Dec 20 19:07:35 pve02 kernel: [35190.127707]  ? aa_sk_perm+0x43/0x180
Dec 20 19:07:35 pve02 kernel: [35190.128313]  inet_recvmsg+0x5e/0xf0
Dec 20 19:07:35 pve02 kernel: [35190.128910]  sock_recvmsg+0x66/0x70
Dec 20 19:07:35 pve02 kernel: [35190.129502]  sock_read_iter+0x8f/0xf0
Dec 20 19:07:35 pve02 kernel: [35190.130082]  new_sync_read+0x122/0x1b0
Dec 20 19:07:35 pve02 kernel: [35190.130661]  __vfs_read+0x29/0x40
Dec 20 19:07:35 pve02 kernel: [35190.131242]  vfs_read+0x99/0x160
Dec 20 19:07:35 pve02 kernel: [35190.131825]  ksys_read+0x61/0xe0
Dec 20 19:07:35 pve02 kernel: [35190.132408]  __x64_sys_read+0x1a/0x20
Dec 20 19:07:35 pve02 kernel: [35190.132990]  do_syscall_64+0x57/0x190
Dec 20 19:07:35 pve02 kernel: [35190.133581]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 20 19:07:35 pve02 kernel: [35190.134173] RIP: 0033:0x7fef8ccd2461
Dec 20 19:07:35 pve02 kernel: [35190.134754] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
Dec 20 19:07:35 pve02 kernel: [35190.136028] RSP: 002b:00007ffdf1fa6db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Dec 20 19:07:35 pve02 kernel: [35190.136687] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fef8ccd2461
Dec 20 19:07:35 pve02 kernel: [35190.137352] RDX: 0000000000004000 RSI: 00007ffdf1fa6dc0 RDI: 0000000000000003
Dec 20 19:07:35 pve02 kernel: [35190.138050] RBP: 00005569812945f0 R08: 00007ffdf1faad58 R09: 00007ffdf1faad50
Dec 20 19:07:35 pve02 kernel: [35190.138726] R10: 0000000000008975 R11: 0000000000000246 R12: 00007ffdf1fa6dc0
Dec 20 19:07:35 pve02 kernel: [35190.139406] R13: 000055698082cb00 R14: 0000000000000003 R15: 00007ffdf1faae60
Dec 20 19:07:35 pve02 kernel: [35190.140087] Modules linked in: veth tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp iptable_filter bpfilter softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi kvm_intel kvm snd_hda_codec_realtek irqbypass zfs(PO) snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ledtrig_audio ghash_clmulni_intel aesni_intel zunicode(PO) crypto_simd zlua(PO) cryptd zavl(PO) glue_helper icp(PO) rapl snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core i915 snd_hwdep snd_pcm drm_kms_helper snd_timer snd drm soundcore i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt intel_cstate mxm_wmi pcspkr input_leds joydev eeepc_wmi mei_me usbmouse asus_wmi mei sparse_keymap wmi_bmof mac_hid zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs
Dec 20 19:07:35 pve02 kernel: [35190.140110]  xor zstd_compress hid_generic usbkbd usbhid hid raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ahci libahci i2c_i801 xhci_pci lpc_ich r8169 realtek xhci_hcd ehci_pci ehci_hcd video wmi
Dec 20 19:07:35 pve02 kernel: [35190.147359] CR2: 0000000000008000
Dec 20 19:07:35 pve02 kernel: [35190.148242] ---[ end trace f2f583820acb9bf8 ]---
Dec 20 19:07:35 pve02 kernel: [35190.149216] RIP: 0010:skb_release_data+0xa9/0x180
Dec 20 19:07:35 pve02 kernel: [35190.150169] Code: 48 0f 45 fa 66 66 66 66 90 f0 ff 4f 34 75 ce e8 4d 3f 94 ff 41 0f b6 45 02 48 83 c3 01 39 d8 7f c9 49 8b 7d 08 48 85 ff 74 10 <48> 8b 1f e8 2f f5 ff ff 48 89 df 48 85 db 75 f0 4d 85 e4 74 57 41
Dec 20 19:07:35 pve02 kernel: [35190.152184] RSP: 0018:ffffa8d5c0c1fc30 EFLAGS: 00010206
Dec 20 19:07:35 pve02 kernel: [35190.153179] RAX: 0000000000000020 RBX: 0000000000000000 RCX: ffffffff875f3a00
Dec 20 19:07:35 pve02 kernel: [35190.154207] RDX: 000000000001dae4 RSI: 00000008316f5861 RDI: 0000000000008000
Dec 20 19:07:35 pve02 kernel: [35190.155264] RBP: ffffa8d5c0c1fc48 R08: 00000000000005a8 R09: ffffffff866f05b0
Dec 20 19:07:35 pve02 kernel: [35190.156284] R10: ffff89e42c4923d0 R11: 0000000000000000 R12: ffff89e418731b00
Dec 20 19:07:35 pve02 kernel: [35190.157309] R13: ffff89e40570ca40 R14: ffff89e42c49287c R15: 0000000000000000
Dec 20 19:07:35 pve02 kernel: [35190.158321] FS:  00007fef8c92fe40(0000) GS:ffff89e4bb880000(0000) knlGS:0000000000000000
Dec 20 19:07:35 pve02 kernel: [35190.159314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 20 19:07:35 pve02 kernel: [35190.160249] CR2: 0000000000008000 CR3: 000000009357e004 CR4: 00000000000626e0
Dec 20 19:07:35 pve02 QEMU[31347]: kvm: Disconnect client, due to: Failed to read CMD_WRITE data: Unexpected end-of-file before all bytes were read

Code:
root@pve02:~# pveversion --verbose
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
that does indeed look like a kernel bug...
 
Hi,

just wanted to chime in here. I triggered the same problem with the following setup:
- Intel XL710 NIC in host (up2date i40e+fw) + vmbr + kvm with virtio NIC
- Ubuntu 20.04 in VM with zvol (just single disk, formatted in VM with zfs)
- AMD EPYC 7302P on Supermicro H11SSL-C

When I do a zfs scrub in the VM I see no issues, when I do iperf3 in the VM no issues either. But doing rsync in the VM with heavy disk+net io crashes the host after a while - somewhere between 30 min to 3hrs after starting rsync.
I downgraded the host to 5.4.65-1-pve and it is stable since. Haven't had the time yet for more troubleshooting due to holidays.

Cheers,
foobar42
 
I do have the similar problem with kernel: 5.4.78-2-pve

Jan 18 20:46:49 Server01 kernel: [1068239.992143] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xa0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.992838] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x10a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.993413] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x1aa0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.993961] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x20a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.994520] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x2ca0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.995098] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x30a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.995620] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x34a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.996133] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x40a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.996635] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x4ea0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.997130] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x50a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.997643] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x58a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.998106] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x60a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.998581] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x68a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.999030] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x70a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.999508] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x7600 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068239.999939] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x7ca0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068240.000363] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x80a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068240.000776] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x8d00 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068240.001183] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x90a0 flags=0x0000] Jan 18 20:46:49 Server01 kernel: [1068240.001582] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x98a0 flags=0x0000] Jan 18 20:46:55 Server01 kernel: [1068245.974118] ------------[ cut here ]------------ Jan 18 20:46:55 Server01 kernel: [1068245.974122] NETDEV WATCHDOG: enp67s0f0 (i40e): transmit queue 45 timed out Jan 18 20:46:55 Server01 kernel: [1068245.974148] WARNING: CPU: 73 PID: 0 at net/sched/sch_generic.c:448 dev_watchdog+0x264/0x270 Jan 18 20:46:55 Server01 kernel: [1068245.974148] Modules linked in: veth ceph libceph rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp iptable_filter bpfilter softdog nfnetlink_log nfnetlink amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ipmi_ssif glue_helper pcspkr ast drm_vram_helper ttm joydev input_leds drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp sunrpc libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_generic usbmouse usbkbd usbhid hid igb i2c_algo_bit dca ahci i40e libahci xhci_pci xhci_hcd i2c_piix4 Jan 18 20:46:55 Server01 kernel: [1068245.974205] CPU: 73 PID: 0 Comm: swapper/73 Tainted: P O 5.4.78-2-pve #1 Jan 18 20:46:55 Server01 kernel: [1068245.974205] Hardware name: H11DSi, BIOS 2.1 02/21/2020 Jan 18 20:46:55 Server01 kernel: [1068245.974208] RIP: 0010:dev_watchdog+0x264/0x270 Jan 18 20:46:55 Server01 kernel: [1068245.974210] Code: 48 85 c0 75 e6 eb a0 4c 89 ef c6 05 8f d6 ea 00 01 e8 60 aa fa ff 89 d9 4c 89 ee 48 c7 c7 28 4c 63 97 48 89 c2 e8 4d 31 74 ff <0f> 0b eb 82 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 Jan 18 20:46:55 Server01 kernel: [1068245.974211] RSP: 0018:ffffae411a120e58 EFLAGS: 00010282 Jan 18 20:46:55 Server01 kernel: [1068245.974212] RAX: 0000000000000000 RBX: 000000000000002d RCX: 0000000000000006 Jan 18 20:46:55 Server01 kernel: [1068245.974213] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff95fcce8578c0 Jan 18 20:46:55 Server01 kernel: [1068245.974213] RBP: ffffae411a120e88 R08: 0000000000000a44 R09: 0000000000000004 Jan 18 20:46:55 Server01 kernel: [1068245.974214] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000080 Jan 18 20:46:55 Server01 kernel: [1068245.974214] R13: ffff95fbee894000 R14: ffff95fbee894480 R15: ffff95fbed519f40 Jan 18 20:46:55 Server01 kernel: [1068245.974215] FS: 0000000000000000(0000) GS:ffff95fcce840000(0000) knlGS:0000000000000000 Jan 18 20:46:55 Server01 kernel: [1068245.974216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 18 20:46:55 Server01 kernel: [1068245.974216] CR2: 00000266054e5000 CR3: 0000007d45c14000 CR4: 0000000000340ee0 Jan 18 20:46:55 Server01 kernel: [1068245.974217] Call Trace: Jan 18 20:46:55 Server01 kernel: [1068245.974220] <IRQ> Jan 18 20:46:55 Server01 kernel: [1068245.974224] ? pfifo_fast_enqueue+0x160/0x160 Jan 18 20:46:55 Server01 kernel: [1068245.974229] call_timer_fn+0x32/0x130 Jan 18 20:46:55 Server01 kernel: [1068245.974231] run_timer_softirq+0x1a5/0x430 Jan 18 20:46:55 Server01 kernel: [1068245.974232] ? enqueue_hrtimer+0x3c/0x90 Jan 18 20:46:55 Server01 kernel: [1068245.974234] ? ktime_get+0x3c/0xa0 Jan 18 20:46:55 Server01 kernel: [1068245.974238] ? lapic_next_event+0x20/0x30 Jan 18 20:46:55 Server01 kernel: [1068245.974240] ? clockevents_program_event+0x93/0xf0 Jan 18 20:46:55 Server01 kernel: [1068245.974243] __do_softirq+0xdc/0x2d4 Jan 18 20:46:55 Server01 kernel: [1068245.974246] irq_exit+0xa9/0xb0 Jan 18 20:46:55 Server01 kernel: [1068245.974247] smp_apic_timer_interrupt+0x79/0x130 Jan 18 20:46:55 Server01 kernel: [1068245.974250] apic_timer_interrupt+0xf/0x20 Jan 18 20:46:55 Server01 kernel: [1068245.974250] </IRQ> Jan 18 20:46:55 Server01 kernel: [1068245.974254] RIP: 0010:cpuidle_enter_state+0xbd/0x450 Jan 18 20:46:55 Server01 kernel: [1068245.974255] Code: ff e8 57 77 84 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 4a e8 8a ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d Jan 18 20:46:55 Server01 kernel: [1068245.974255] RSP: 0018:ffffae410079fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Jan 18 20:46:55 Server01 kernel: [1068245.974256] RAX: ffff95fcce86ae00 RBX: ffffffff97966a00 RCX: 000000000000001f Jan 18 20:46:55 Server01 kernel: [1068245.974256] RDX: 0003cb9065d1cb63 RSI: 000000002c235171 RDI: 0000000000000000 Jan 18 20:46:55 Server01 kernel: [1068245.974257] RBP: ffffae410079fe88 R08: 0000000000000002 R09: 000000000002a680 Jan 18 20:46:55 Server01 kernel: [1068245.974257] R10: 000b01c5ffb8d40c R11: ffff95fcce869aa0 R12: ffff96fbb3267800 Jan 18 20:46:55 Server01 kernel: [1068245.974258] R13: 0000000000000001 R14: ffffffff97966a78 R15: ffffffff97966a60 Jan 18 20:46:55 Server01 kernel: [1068245.974259] ? cpuidle_enter_state+0x99/0x450 Jan 18 20:46:55 Server01 kernel: [1068245.974260] cpuidle_enter+0x2e/0x40 Jan 18 20:46:55 Server01 kernel: [1068245.974263] call_cpuidle+0x23/0x40 Jan 18 20:46:55 Server01 kernel: [1068245.974264] do_idle+0x22c/0x270 Jan 18 20:46:55 Server01 kernel: [1068245.974264] cpu_startup_entry+0x1d/0x20 Jan 18 20:46:55 Server01 kernel: [1068245.974265] start_secondary+0x166/0x1c0 Jan 18 20:46:55 Server01 kernel: [1068245.974269] secondary_startup_64+0xa4/0xb0 Jan 18 20:46:55 Server01 kernel: [1068245.974271] ---[ end trace c8ed8042797d0d15 ]--- Jan 18 20:46:55 Server01 kernel: [1068245.974277] i40e 0000:44:00.0 enp67s0f0: tx_timeout: VSI_seid: 390, Q 45, NTC: 0x1b5, HWB: 0x1b5, NTU: 0x1d9, TAIL: 0x1d9, INT: 0x1 Jan 18 20:46:55 Server01 kernel: [1068245.974278] i40e 0000:44:00.0 enp67s0f0: tx_timeout recovery level 1, hung_queue 45 Jan 18 20:46:55 Server01 kernel: [1068245.974976] i40e 0000:44:00.0: VSI seid 390 Tx ring 0 disable timeout Jan 18 20:46:55 Server01 kernel: [1068246.105049] i40e 0000:44:00.0: VSI seid 392 Tx ring 128 disable timeout Jan 18 20:46:56 Server01 kernel: [1068246.157356] vmbr925: port 1(enp67s0f0) entered disabled state Jan 18 20:46:56 Server01 kernel: [1068246.336372] i40e 0000:44:00.1: VSI seid 393 Tx ring 128 disable timeout Jan 18 20:46:58 Server01 ntpd[3342]: Deleting interface #4 vmbr925, 10.99.125.18#123, interface stats: received=0, sent=0, dropped=0, active_time=1068240 secs Jan 18 20:46:59 Server01 kernel: [1068249.456122] amd_iommu_report_page_fault: 15 callbacks suppressed Jan 18 20:46:59 Server01 kernel: [1068249.456129] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xc4a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.456656] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xd0a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.457070] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xf0a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.457473] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x100a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.457861] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xe0a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.458263] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x110a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.458632] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x11ea0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.458991] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x120a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.459340] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x12aa0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.459678] i40e 0000:44:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x130a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.460006] amd_iommu_report_page_fault: 5 callbacks suppressed Jan 18 20:46:59 Server01 kernel: [1068249.460007] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x138a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.460343] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x140a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.460668] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x14ca0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.460974] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x150a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.461267] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x158a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.461544] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x160a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.461814] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x16ca0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.462084] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x170a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.462341] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x178a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.462590] AMD-Vi: Event logged [IO_PAGE_FAULT device=44:00.0 domain=0x000e address=0x180a0 flags=0x0000] Jan 18 20:46:59 Server01 kernel: [1068249.594922] irq 775: Affinity broken due to vector space exhaustion.
 
  • Like
Reactions: lps90
Similar issues on our cluster happening at random. Started a few weeks ago. Have had a node freeze twice in the last few weeks, bringing down lots of VM's/services in a production environment.

We have not made any changes to underlying hardware or BIOS config recently.

Code:
Jul 28 10:31:13 px3 kernel: [514092.902026] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x40 flags=0x0000]
Jul 28 10:31:13 px3 kernel: [514092.902575] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x2040 flags=0x0000]
Jul 28 10:31:13 px3 kernel: [514092.903016] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x3040 flags=0x0000]
Jul 28 10:31:13 px3 kernel: [514092.903496] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x3640 flags=0x0000]
Jul 28 10:31:13 px3 kernel: [514092.904085] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x3c40 flags=0x0000]
Jul 28 10:31:13 px3 kernel: [514092.904668] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x5040 flags=0x0050]
Jul 28 10:31:13 px3 kernel: [514092.905081] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x5240 flags=0x0050]
Jul 28 10:31:13 px3 kernel: [514092.905479] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x5440 flags=0x0050]
Jul 28 10:31:19 px3 kernel: [514099.521581] ------------[ cut here ]------------
Jul 28 10:31:19 px3 kernel: [514099.521585] NETDEV WATCHDOG: enp1s0f2 (i40e): transmit queue 13 timed out
Jul 28 10:31:19 px3 kernel: [514099.521604] WARNING: CPU: 29 PID: 0 at net/sched/sch_generic.c:473 dev_watchdog+0x264/0x270
Jul 28 10:31:19 px3 kernel: [514099.521605] Modules linked in: rbd ceph libceph fscache dm_crypt algif_skcipher af_alg ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp iptable_filter bpfilter 8021q garp mrp bonding softdog nfnetlink_log nfnetlink ipmi_ssif amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper pcspkr ast drm_vram_helper joydev input_leds mac_hid ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c hid_generic usbkbd usbmouse usbhid hid bnxt_en ahci mpt3sas xhci_pci libahci raid_class xhci_hcd i2c_piix4 scsi_transport_sas i40e
Jul 28 10:31:19 px3 kernel: [514099.521662] CPU: 29 PID: 0 Comm: swapper/29 Tainted: P           O      5.4.124-1-pve #1
Jul 28 10:31:19 px3 kernel: [514099.521663] Hardware name: Supermicro AS -2113S-WTRT/H11SSW-NT, BIOS 2.3 11/25/2020
Jul 28 10:31:19 px3 kernel: [514099.521665] RIP: 0010:dev_watchdog+0x264/0x270
Jul 28 10:31:19 px3 kernel: [514099.521668] Code: 48 85 c0 75 e6 eb a0 4c 89 ef c6 05 41 b6 ef 00 01 e8 50 b7 fa ff 89 d9 4c 89 ee 48 c7 c7 c0 61 23 ae 48 89 c2 e8 f5 57 15 00 <0f> 0b eb 82 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
Jul 28 10:31:19 px3 kernel: [514099.521669] RSP: 0018:ffffb91040a60e58 EFLAGS: 00010282
Jul 28 10:31:19 px3 kernel: [514099.521671] RAX: 0000000000000000 RBX: 000000000000000d RCX: 0000000000000006
Jul 28 10:31:19 px3 kernel: [514099.521672] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff904c4e9578c0
Jul 28 10:31:19 px3 kernel: [514099.521672] RBP: ffffb91040a60e88 R08: 0000000000000bf8 R09: 0000000000000004
Jul 28 10:31:19 px3 kernel: [514099.521673] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000040
Jul 28 10:31:19 px3 kernel: [514099.521674] R13: ffff904c1a52f000 R14: ffff904c1a52f480 R15: ffff904c1a7e4f40
Jul 28 10:31:19 px3 kernel: [514099.521676] FS:  0000000000000000(0000) GS:ffff904c4e940000(0000) knlGS:0000000000000000
Jul 28 10:31:19 px3 kernel: [514099.521677] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 28 10:31:19 px3 kernel: [514099.521678] CR2: 00007f03260b0000 CR3: 0000003b7a8aa000 CR4: 0000000000340ee0
Jul 28 10:31:19 px3 kernel: [514099.521679] Call Trace:
Jul 28 10:31:19 px3 kernel: [514099.521681]  <IRQ>
Jul 28 10:31:19 px3 kernel: [514099.521685]  ? pfifo_fast_enqueue+0x160/0x160
Jul 28 10:31:19 px3 kernel: [514099.521689]  call_timer_fn+0x32/0x130
Jul 28 10:31:19 px3 kernel: [514099.521691]  run_timer_softirq+0x1a5/0x430
Jul 28 10:31:19 px3 kernel: [514099.521693]  ? enqueue_hrtimer+0x3c/0x90
Jul 28 10:31:19 px3 kernel: [514099.521695]  ? ktime_get+0x3c/0xa0
Jul 28 10:31:19 px3 kernel: [514099.521698]  ? lapic_next_event+0x20/0x30
Jul 28 10:31:19 px3 kernel: [514099.521701]  ? clockevents_program_event+0x93/0xf0
Jul 28 10:31:19 px3 kernel: [514099.521704]  __do_softirq+0xdc/0x2d4
Jul 28 10:31:19 px3 kernel: [514099.521708]  irq_exit+0xa9/0xb0
Jul 28 10:31:19 px3 kernel: [514099.521709]  smp_apic_timer_interrupt+0x79/0x130
Jul 28 10:31:19 px3 kernel: [514099.521711]  apic_timer_interrupt+0xf/0x20
Jul 28 10:31:19 px3 kernel: [514099.521712]  </IRQ>
Jul 28 10:31:19 px3 kernel: [514099.521716] RIP: 0010:cpuidle_enter_state+0xbd/0x450
Jul 28 10:31:19 px3 kernel: [514099.521717] Code: ff e8 f7 69 88 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 2a 76 8e ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d
Jul 28 10:31:19 px3 kernel: [514099.521718] RSP: 0018:ffffb910402e7e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Jul 28 10:31:19 px3 kernel: [514099.521719] RAX: ffff904c4e96ae00 RBX: ffffffffae5669c0 RCX: 000000000000001f
Jul 28 10:31:19 px3 kernel: [514099.521720] RDX: 0001d3921f5cc157 RSI: 000000002db6dc7f RDI: 0000000000000000
Jul 28 10:31:19 px3 kernel: [514099.521721] RBP: ffffb910402e7e88 R08: 0000000000000002 R09: 000000000002a680
Jul 28 10:31:19 px3 kernel: [514099.521722] R10: 00051d67be363cf0 R11: ffff904c4e969aa0 R12: ffff904c213f5800
Jul 28 10:31:19 px3 kernel: [514099.521723] R13: 0000000000000002 R14: ffffffffae566a98 R15: ffffffffae566a80
Jul 28 10:31:19 px3 kernel: [514099.521726]  ? cpuidle_enter_state+0x99/0x450
Jul 28 10:31:19 px3 kernel: [514099.521728]  cpuidle_enter+0x2e/0x40
Jul 28 10:31:19 px3 kernel: [514099.521731]  call_cpuidle+0x23/0x40
Jul 28 10:31:19 px3 kernel: [514099.521732]  do_idle+0x22c/0x270
Jul 28 10:31:19 px3 kernel: [514099.521734]  cpu_startup_entry+0x1d/0x20
Jul 28 10:31:19 px3 kernel: [514099.521736]  start_secondary+0x166/0x1c0
Jul 28 10:31:19 px3 kernel: [514099.521739]  secondary_startup_64+0xa4/0xb0
Jul 28 10:31:19 px3 kernel: [514099.521741] ---[ end trace 7e14c924b64ce1e6 ]---
Jul 28 10:31:19 px3 kernel: [514099.521749] i40e 0000:01:00.2 enp1s0f2: tx_timeout: VSI_seid: 398, Q 13, NTC: 0x2e, HWB: 0x2e, NTU: 0x8e, TAIL: 0x8e, INT: 0x1
Jul 28 10:31:19 px3 kernel: [514099.521752] i40e 0000:01:00.2 enp1s0f2: tx_timeout recovery level 1, hung_queue 13
Jul 28 10:31:19 px3 kernel: [514099.522300] i40e 0000:01:00.2: VSI seid 398 Tx ring 0 disable timeout
Jul 28 10:31:19 px3 kernel: [514099.592532] i40e 0000:01:00.2: VSI seid 402 Tx ring 64 disable timeout
Jul 28 10:31:20 px3 kernel: [514099.823908] i40e 0000:01:00.0: VSI seid 396 Tx ring 0 disable timeout
Jul 28 10:31:20 px3 kernel: [514099.879957] i40e 0000:01:00.0: VSI seid 400 Tx ring 64 disable timeout
Jul 28 10:31:20 px3 kernel: [514099.930017] i40e 0000:01:00.3: VSI seid 399 Tx ring 0 disable timeout
Jul 28 10:31:20 px3 kernel: [514100.000525] i40e 0000:01:00.3: VSI seid 403 Tx ring 64 disable timeout
Jul 28 10:31:20 px3 kernel: [514100.050602] i40e 0000:01:00.1: VSI seid 397 Tx ring 0 disable timeout
Jul 28 10:31:20 px3 kernel: [514100.112479] i40e 0000:01:00.1: VSI seid 401 Tx ring 64 disable timeout
Jul 28 10:31:23 px3 kernel: [514102.943505] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x5640 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.944007] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x7040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.944389] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x7e40 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.944764] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x8040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.945132] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x6040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.945507] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0x9040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.945862] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xa040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.946208] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xb040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.946548] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xbd00 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.946881] i40e 0000:01:00.2: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0032 address=0xc040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.947206] amd_iommu_report_page_fault: 6 callbacks suppressed
Jul 28 10:31:23 px3 kernel: [514102.947207] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0xcc00 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.947539] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0xd040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.947866] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0xdc40 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.948184] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0xe040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.948497] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0xec40 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.948802] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0xf040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.949099] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0xfc40 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.949389] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0x10040 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.949683] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0x10a40 flags=0x0000]
Jul 28 10:31:23 px3 kernel: [514102.949965] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.2 domain=0x0032 address=0x11040 flags=0x0000]
Jul 28 10:31:28 px3 kernel: [514108.737303] i40e 0000:01:00.0 enp1s0f0: tx_timeout: VSI_seid: 396, Q 45, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
Jul 28 10:31:28 px3 kernel: [514108.737312] i40e 0000:01:00.0 enp1s0f0: tx_timeout recovery level 1, hung_queue 45
Jul 28 10:31:28 px3 kernel: [514108.737808] i40e 0000:01:00.0: VSI seid 396 Tx ring 0 disable timeout
Jul 28 10:31:29 px3 kernel: [514108.795974] i40e 0000:01:00.0: VSI seid 402 Tx ring 64 disable timeout
Jul 28 10:31:29 px3 kernel: [514109.027744] i40e 0000:01:00.1: VSI seid 397 Tx ring 0 disable timeout
Jul 28 10:31:29 px3 kernel: [514109.088372] i40e 0000:01:00.1: VSI seid 401 Tx ring 64 disable timeout
Jul 28 10:31:29 px3 kernel: [514109.138452] i40e 0000:01:00.2: VSI seid 398 Tx ring 0 disable timeout
Jul 28 10:31:29 px3 kernel: [514109.200020] i40e 0000:01:00.2: VSI seid 400 Tx ring 64 disable timeout
Jul 28 10:31:29 px3 kernel: [514109.250089] i40e 0000:01:00.3: VSI seid 399 Tx ring 0 disable timeout
Jul 28 10:31:29 px3 kernel: [514109.306358] i40e 0000:01:00.3: VSI seid 403 Tx ring 64 disable timeout
Jul 28 10:31:34 px3 kernel: [514113.994564] libceph: osd8 down
Jul 28 10:31:34 px3 kernel: [514113.994566] libceph: osd11 down
Jul 28 10:31:34 px3 kernel: [514113.994566] libceph: osd30 down
Jul 28 10:31:34 px3 kernel: [514113.994567] libceph: osd31 down
Jul 28 10:31:34 px3 kernel: [514113.994567] libceph: osd32 down

Running 7402P in Supermicro Single Socket Servers...
Code:
NX (Execute Disable) protection: active
Jul 28 10:36:21 px3 kernel: [    0.000000] efi: EFI v2.70 by American Megatrends
Jul 28 10:36:21 px3 kernel: [    0.000000] efi:  ACPI=0xa7693000  ACPI 2.0=0xa7693014  SMBIOS=0xa850e000  SMBIOS 3.0=0xa850d000  MEMATTR=0x9f966018  ESRT=0x9e83ba18
Jul 28 10:36:21 px3 kernel: [    0.000000] secureboot: Secure boot could not be determined (mode 0)
Jul 28 10:36:21 px3 kernel: [    0.000000] SMBIOS 3.2.0 present.
Jul 28 10:36:21 px3 kernel: [    0.000000] DMI: Supermicro AS -2113S-WTRT/H11SSW-NT, BIOS 2.3 11/25/2020

It's going to be a few weeks before I have time to prepare for and update the cluster to Proxmox 7. Not sure if that would make any difference but I thought it worth bringing this up. I think there's a kernel bug causing issues.

Code:
proxmox-ve: 6.4-1 (running kernel: 5.4.124-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-4
pve-kernel-helper: 6.4-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.124-1-pve: 5.4.124-2
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 15.2.13-pve1~bpo10
ceph-fuse: 15.2.13-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
 
Last edited:
After a 6.4.1 to 7.2 upgrade, I'm getting this error as well. Proxmox 6 was stable so this has me spooked as I just upgraded all my nodes to 7.2-3. Also caused a huge spike in IO delay that's ongoing now.

Code:
May 22 03:58:07 benson kernel: BUG: unable to handle page fault for address: 0000000000001094
May 22 03:58:07 benson kernel: #PF: supervisor read access in kernel mode
May 22 03:58:07 benson kernel: #PF: error_code(0x0000) - not-present page
May 22 03:58:07 benson kernel: PGD 0 P4D 0
May 22 03:58:07 benson kernel: Oops: 0000 [#1] SMP PTI
May 22 03:58:07 benson kernel: CPU: 2 PID: 1556580 Comm: z_wr_iss Tainted: P           O      5.15.35-1-pve #1
May 22 03:58:07 benson kernel: Hardware name: HP HP EliteDesk 800 G2 DM 35W/8055, BIOS N21 Ver. 02.32 01/30/2018
May 22 03:58:07 benson kernel: RIP: 0010:kmem_cache_alloc+0xfd/0x2e0
May 22 03:58:07 benson kernel: Code: 8b 50 08 49 8b 00 49 83 78 10 00 48 89 45 c8 0f 84 92 01 00 00 48 85 c0 0f 84 89 01 00 00 41 8b 4c 24 28 49 8b 3c 24 48 01 c1 <48> 8b 19 48 89 ce 49 33 9c 24 b8 00 00 00 48 8d 4a 01 48 0f ce 48
May 22 03:58:07 benson kernel: RSP: 0018:ffffbee19de1bc70 EFLAGS: 00010202
May 22 03:58:07 benson kernel: RAX: 0000000000000094 RBX: 0000000000002000 RCX: 0000000000001094
May 22 03:58:07 benson kernel: RDX: 00000000008b3177 RSI: 0000000000042c20 RDI: 000042abb0417a40
May 22 03:58:07 benson kernel: RBP: ffffbee19de1bcb0 R08: ffffdee17fc97a40 R09: ffffbee19de1bd80
May 22 03:58:07 benson kernel: R10: 00000000c6f48b33 R11: 0000000000000000 R12: ffff9c2ed0e29500
May 22 03:58:07 benson kernel: R13: 0000000000000000 R14: 0000000000042c20 R15: 0000000000042c20
May 22 03:58:07 benson kernel: FS:  0000000000000000(0000) GS:ffff9c35cf880000(0000) knlGS:0000000000000000
May 22 03:58:07 benson kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 03:58:07 benson kernel: CR2: 0000000000001094 CR3: 0000000468e10004 CR4: 00000000003726e0
May 22 03:58:07 benson kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 22 03:58:07 benson kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 22 03:58:07 benson kernel: Call Trace:
May 22 03:58:07 benson kernel:  <TASK>
May 22 03:58:07 benson kernel:  ? spl_kmem_cache_alloc+0x79/0x790 [spl]
May 22 03:58:07 benson kernel:  spl_kmem_cache_alloc+0x79/0x790 [spl]
May 22 03:58:07 benson kernel:  ? zio_execute+0x95/0x160 [zfs]
May 22 03:58:07 benson kernel:  ? __cond_resched+0x1a/0x50
May 22 03:58:07 benson kernel:  ? mutex_lock+0x13/0x40
May 22 03:58:07 benson kernel:  ? zio_wait_for_children+0xaf/0x140 [zfs]
May 22 03:58:07 benson kernel:  ? vdev_mirror_io_start+0x113/0x280 [zfs]
May 22 03:58:07 benson kernel:  zio_write_compress+0x528/0xa00 [zfs]
May 22 03:58:07 benson kernel:  zio_execute+0x95/0x160 [zfs]
May 22 03:58:07 benson kernel:  taskq_thread+0x29b/0x4c0 [spl]
May 22 03:58:07 benson kernel:  ? wake_up_q+0x90/0x90
May 22 03:58:07 benson kernel:  ? zio_gang_tree_free+0x70/0x70 [zfs]
May 22 03:58:07 benson kernel:  ? taskq_thread_spawn+0x60/0x60 [spl]
May 22 03:58:07 benson kernel:  kthread+0x12a/0x150
May 22 03:58:07 benson kernel:  ? set_kthread_struct+0x50/0x50
May 22 03:58:07 benson kernel:  ret_from_fork+0x22/0x30
May 22 03:58:07 benson kernel:  </TASK>
May 22 03:58:07 benson kernel: Modules linked in: veth joydev input_leds hid_generic usbkbd usbmouse usbhid hid rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter 8021q garp mrp bonding tls iTCO_wdt intel_pmc_bxt iTCO_vendor_support nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek cdc_ether usbnet snd_hda_codec_generic ledtrig_audio intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec irqbypass mei_hdcp ttm crct10dif_pclmul snd_hda_core ghash_clmulni_intel drm_kms_helper snd_hwdep aesni_intel cec crypto_simd rc_core cryptd i2c_algo_bit snd_pcm fb_sys_fops hp_wmi syscopyarea r8152 platform_profile rapl snd_timer sysfillrect snd mei_me intel_cstate pcspkr sparse_keymap efi_pstore wmi_bmof mii ee1004 soundcore
May 22 03:58:07 benson kernel:  sysimgblt mei intel_pch_thermal mac_hid tpm_infineon acpi_pad vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb e1000e nvme crc32_pclmul i2c_i801 ahci xhci_pci xhci_pci_renesas i2c_smbus nvme_core libahci xhci_hcd wmi video
May 22 03:58:07 benson kernel: CR2: 0000000000001094
May 22 03:58:07 benson kernel: ---[ end trace b22404b88cc8ea3f ]---
May 22 03:58:07 benson kernel: RIP: 0010:kmem_cache_alloc+0xfd/0x2e0
May 22 03:58:07 benson kernel: Code: 8b 50 08 49 8b 00 49 83 78 10 00 48 89 45 c8 0f 84 92 01 00 00 48 85 c0 0f 84 89 01 00 00 41 8b 4c 24 28 49 8b 3c 24 48 01 c1 <48> 8b 19 48 89 ce 49 33 9c 24 b8 00 00 00 48 8d 4a 01 48 0f ce 48
May 22 03:58:07 benson kernel: RSP: 0018:ffffbee19de1bc70 EFLAGS: 00010202
May 22 03:58:07 benson kernel: RAX: 0000000000000094 RBX: 0000000000002000 RCX: 0000000000001094
May 22 03:58:07 benson kernel: RDX: 00000000008b3177 RSI: 0000000000042c20 RDI: 000042abb0417a40
May 22 03:58:07 benson kernel: RBP: ffffbee19de1bcb0 R08: ffffdee17fc97a40 R09: ffffbee19de1bd80
May 22 03:58:07 benson kernel: R10: 00000000c6f48b33 R11: 0000000000000000 R12: ffff9c2ed0e29500
May 22 03:58:07 benson kernel: R13: 0000000000000000 R14: 0000000000042c20 R15: 0000000000042c20
May 22 03:58:07 benson kernel: FS:  0000000000000000(0000) GS:ffff9c35cf880000(0000) knlGS:0000000000000000
May 22 03:58:07 benson kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 03:58:07 benson kernel: CR2: 0000000000001094 CR3: 00000001655dc006 CR4: 00000000003726e0
May 22 03:58:07 benson kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 22 03:58:07 benson kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
cpu load.jpg
Code:
root@benson:~# pveversion --verbose
proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-5.15: 7.2-3
pve-kernel-helper: 7.2-3
pve-kernel-5.4: 6.4-15
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.4.174-2-pve: 5.4.174-2
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-6
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
root@benson:~#
 
Last edited:
I think I have same or similar issue. I had to reboot node.

Code:
May 23 10:03:02 s7 kernel: [893783.712903] show_signal: 8 callbacks suppressed
May 23 10:03:02 s7 kernel: [893783.712906] traps: pvescheduler[2627784] general protection fault ip:55c0dffe3f94 sp:7ffd3cb21a60 error:0 in perl[55c0dff2c000+185000]
May 23 10:04:11 s7 pvescheduler[2630746]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
May 23 10:04:48 s7 kernel: [893889.958084] BUG: Bad page state in process kvm  pfn:ffffff220738ad96
May 23 10:04:48 s7 kernel: [893889.958949] page:00000000fa32aaf3 refcount:-14506 mapcount:0 mapping:0000000000000000 index:0xffff8fd80e2b65b0 pfn:0xffffff220738ad96
May 23 10:04:48 s7 kernel: [893889.960356] memcg:ffff8fd80e2b65d0
May 23 10:04:48 s7 kernel: [893889.960356] flags: 0xffffc756b8aaeec8(waiters|dirty|workingset|slab|owner_priv_1|arch_1|private|private_2|writeback|mappedtodisk|swapbacked|mlocked|hwpoison|node=1023|zone=7|lastcpupid=0x1f1d5a)
May 23 10:04:48 s7 kernel: [893889.963103] raw: ffffc756b8aaeec8 dead000000000100 dead000000000122 ffff8fd80e2b65b0
May 23 10:04:48 s7 kernel: [893889.963103] raw: ffff8fd80e2b65b0 ffffc756862cc008 ffffc756b3610108 ffff8fd80e2b65d0
May 23 10:04:48 s7 kernel: [893889.963103] page dumped because: page still charged to cgroup
May 23 10:04:48 s7 kernel: [893889.963103] Modules linked in: joydev input_leds hid_generic usbmouse usbkbd usbhid hid uas usb_storage veth tcp_diag inet_diag ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip_set_hash_net ip_set nf_tables softdog bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd amdgpu snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi iommu_v2 gpu_sched drm_ttm_helper irqbypass ttm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi crct10dif_pclmul snd_hda_codec drm_kms_helper ghash_clmulni_intel aesni_intel snd_hda_core cec rc_core snd_hwdep crypto_simd i2c_algo_bit snd_pcm cryptd fb_sys_fops eeepc_wmi syscopyarea snd_timer asus_wmi rapl sysfillrect sysimgblt snd platform_profile soundcore
May 23 10:04:48 s7 kernel: [893889.963103]  sparse_keymap ccp video pcspkr k10temp efi_pstore wmi_bmof mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi msr drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb xhci_pci xhci_pci_renesas crc32_pclmul nvme ahci i2c_piix4 r8169 realtek xhci_hcd libahci nvme_core wmi gpio_amdpt gpio_generic
May 23 10:04:48 s7 kernel: [893889.971099] CPU: 12 PID: 228872 Comm: kvm Tainted: P           O      5.15.35-1-pve #1
May 23 10:04:48 s7 kernel: [893889.975104] Hardware name: ASUS System Product Name/PRIME B550M-K, BIOS 1401 12/03/2020
May 23 10:04:48 s7 kernel: [893889.975104] Call Trace:
May 23 10:04:48 s7 kernel: [893889.975104]  <TASK>
May 23 10:04:48 s7 kernel: [893889.975104]  dump_stack_lvl+0x4a/0x5f
May 23 10:04:48 s7 kernel: [893889.975104]  dump_stack+0x10/0x12
May 23 10:04:48 s7 kernel: [893889.975104]  bad_page.cold+0x63/0x94
May 23 10:04:48 s7 kernel: [893889.975104]  check_free_page_bad+0x66/0x70
May 23 10:04:48 s7 kernel: [893889.975104]  free_pcppages_bulk+0x1c3/0x390
May 23 10:04:48 s7 kernel: [893889.975104]  free_unref_page_commit.constprop.0+0x12b/0x170
May 23 10:04:48 s7 kernel: [893889.975104]  free_unref_page_list+0x1b3/0x320
May 23 10:04:48 s7 kernel: [893889.975104]  release_pages+0x165/0x530
May 23 10:04:48 s7 kernel: [893889.983110]  free_pages_and_swap_cache+0x48/0x60
May 23 10:04:48 s7 kernel: [893889.983110]  tlb_finish_mmu+0x89/0x1c0
May 23 10:04:48 s7 kernel: [893889.983110]  zap_page_range+0x120/0x170
May 23 10:04:48 s7 kernel: [893889.983110]  do_madvise.part.0+0x8ca/0xf20
May 23 10:04:48 s7 kernel: [893889.983110]  ? do_syscall_64+0x69/0xc0
May 23 10:04:48 s7 kernel: [893889.983110]  ? exit_to_user_mode_prepare+0x37/0x1b0
May 23 10:04:48 s7 kernel: [893889.983110]  __x64_sys_madvise+0x58/0x70
May 23 10:04:48 s7 kernel: [893889.987101]  do_syscall_64+0x5c/0xc0
May 23 10:04:48 s7 kernel: [893889.987101]  ? do_syscall_64+0x69/0xc0
May 23 10:04:48 s7 kernel: [893889.987101]  ? do_syscall_64+0x69/0xc0
May 23 10:04:48 s7 kernel: [893889.987101]  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
May 23 10:04:48 s7 kernel: [893889.987101]  entry_SYSCALL_64_after_hwframe+0x44/0xae
May 23 10:04:48 s7 kernel: [893889.991106] RIP: 0033:0x7f3b047d5cf7
May 23 10:04:48 s7 kernel: [893889.991106] Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 15 91 51 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 51 0c 00 f7 d8 64 89 01 48
May 23 10:04:48 s7 kernel: [893889.991106] RSP: 002b:00007f3af8958e68 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
May 23 10:04:48 s7 kernel: [893889.991106] RAX: ffffffffffffffda RBX: 0000556ee42f6350 RCX: 00007f3b047d5cf7
May 23 10:04:48 s7 kernel: [893889.991106] RDX: 0000000000000004 RSI: 0000000000200000 RDI: 00007f3a3b400000
May 23 10:04:48 s7 kernel: [893889.991106] RBP: 00000000ffffffff R08: 0000000100000000 R09: 0000000000000000
May 23 10:04:48 s7 kernel: [893889.995104] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000200000
May 23 10:04:48 s7 kernel: [893889.995104] R13: 00007f3a3b400000 R14: 00007f3af895c098 R15: 000000004f600000
May 23 10:04:48 s7 kernel: [893889.995104]  </TASK>
 
no, that is a completely different trace..
 
Has there been any updates to this old thread? I am seeing similar behavior:

Code:
May 19 18:40:35 proxmox kernel: BUG: unable to handle page fault for address: 00000000000f424b
May 19 18:40:35 proxmox kernel: #PF: supervisor write access in kernel mode
May 19 18:40:35 proxmox kernel: #PF: error_code(0x0002) - not-present page
May 19 18:40:35 proxmox kernel: PGD 0 P4D 0
May 19 18:40:35 proxmox kernel: Oops: 0002 [#1] PREEMPT SMP PTI
May 19 18:40:35 proxmox kernel: CPU: 0 PID: 518 Comm: watchdog-mux Tainted: P           O       6.2.11-2-pve #1
May 19 18:40:35 proxmox kernel: Hardware name: Intel(R) Client Systems NUC8i3BEK/NUC8BEB, BIOS BECFL357.86A.0092.2023.0214.1114 02/14/2023
May 19 18:40:35 proxmox kernel: RIP: 0010:osq_lock+0x3d/0x160
May 19 18:40:35 proxmox kernel: Code: 48 89 d3 48 83 ec 10 65 8b 05 ab e9 0c 69 83 c0 01 65 48 03 1d ec 73 0b 69 c7 43 10 00 00 00 00 48 c7 03 00 00 00 00 89 43 14 <87> 07 85 c0 0f 84 cf 00 00 00 83 e8 01 49 89 fc 48 98 48 3d ff 1f
May 19 18:40:35 proxmox kernel: RSP: 0018:ffffa8d3410a7d20 EFLAGS: 00010286
May 19 18:40:35 proxmox kernel: RAX: 0000000000000001 RBX: ffff944d9dc324c0 RCX: 0000000000000000
May 19 18:40:35 proxmox kernel: RDX: 00000000000324c0 RSI: 0000000000000000 RDI: 00000000000f424b
May 19 18:40:35 proxmox kernel: RBP: ffffa8d3410a7d40 R08: 0000000000000001 R09: 0000000000000000
May 19 18:40:35 proxmox kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000f423f
May 19 18:40:35 proxmox kernel: R13: 00000000000f424b R14: ffff944646800000 R15: 0000000000000000
May 19 18:40:35 proxmox kernel: FS:  00007fe69b4ce540(0000) GS:ffff944d9dc00000(0000) knlGS:0000000000000000
May 19 18:40:35 proxmox kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 19 18:40:35 proxmox kernel: CR2: 00000000000f424b CR3: 000000010e1ae005 CR4: 00000000003706f0
May 19 18:40:35 proxmox kernel: Call Trace:
May 19 18:40:35 proxmox kernel:  <TASK>
May 19 18:40:35 proxmox kernel:  ? schedule+0x68/0x100
May 19 18:40:35 proxmox kernel:  __mutex_lock.constprop.0+0x193/0x750
May 19 18:40:35 proxmox kernel:  ? __pfx_hrtimer_wakeup+0x10/0x10
May 19 18:40:35 proxmox kernel:  schedule_hrtimeout_range+0x13/0x20
May 19 18:40:35 proxmox kernel:  do_epoll_wait+0x631/0x770
May 19 18:40:35 proxmox kernel:  ? __pfx_ep_autoremove_wake_function+0x10/0x10
May 19 18:40:35 proxmox kernel:  __x64_sys_epoll_wait+0x5e/0x100
May 19 18:40:35 proxmox kernel:  do_syscall_64+0x59/0x90
May 19 18:40:35 proxmox kernel:  ? syscall_exit_to_user_mode+0x26/0x50
May 19 18:40:35 proxmox kernel:  ? do_syscall_64+0x69/0x90
May 19 18:40:35 proxmox kernel:  ? do_syscall_64+0x69/0x90
May 19 18:40:35 proxmox kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
May 19 18:40:35 proxmox kernel: RIP: 0033:0x7fe69b3f4d16
May 19 18:40:35 proxmox kernel: Code: 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 e8 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 90 48 83 ec 28 89 54 24 18 48 89 74 24
May 19 18:40:35 proxmox kernel: RSP: 002b:00007ffdb33a7488 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8
May 19 18:40:35 proxmox kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe69b3f4d16
May 19 18:40:35 proxmox kernel: RDX: 000000000000000a RSI: 00007ffdb33a85e0 RDI: 0000000000000005
May 19 18:40:35 proxmox kernel: RBP: 00007ffdb33a87b0 R08: 00007ffdb33a84a0 R09: 00007ffdb33a5207
May 19 18:40:35 proxmox kernel: R10: 00000000000003e8 R11: 0000000000000246 R12: 0000555fc156c270
May 19 18:40:35 proxmox kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
May 19 18:40:35 proxmox kernel:  </TASK>
May 19 18:40:35 proxmox kernel: Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress intel_rapl_msr ac97_bus intel_rapl_common snd_pcm_dmaengine intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm_intel snd_hda_intel drm_buddy iwlmvm ttm mei_pxp mei_hdcp snd_intel_dspcfg kvm mac80211 snd_intel_sdw_acpi irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel libarc4 sha512_ssse3 snd_hda_codec drm_display_helper cec rc_core snd_hda_core aesni_intel crypto_simd btusb btrtl
May 19 18:40:35 proxmox kernel:  cryptd btbcm btintel btmtk snd_hwdep rapl iwlwifi snd_pcm wmi_bmof intel_cstate drm_kms_helper bluetooth snd_timer intel_wmi_thunderbolt pcspkr i2c_algo_bit joydev syscopyarea mei_me ecdh_generic efi_pstore sysfillrect snd input_leds soundcore ee1004 ecc sysimgblt intel_pch_thermal mei cfg80211 acpi_pad mac_hid acpi_tad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c rtsx_pci_sdmmc nvme xhci_pci xhci_pci_renesas crc32_pclmul e1000e xhci_hcd rtsx_pci nvme_core ahci i2c_i801 i2c_smbus nvme_common libahci video wmi pinctrl_cannonlake
May 19 18:40:35 proxmox kernel: CR2: 00000000000f424b
May 19 18:40:35 proxmox kernel: ---[ end trace 0000000000000000 ]---
May 19 18:40:35 proxmox kernel: RIP: 0010:osq_lock+0x3d/0x160
May 19 18:40:35 proxmox kernel: Code: 48 89 d3 48 83 ec 10 65 8b 05 ab e9 0c 69 83 c0 01 65 48 03 1d ec 73 0b 69 c7 43 10 00 00 00 00 48 c7 03 00 00 00 00 89 43 14 <87> 07 85 c0 0f 84 cf 00 00 00 83 e8 01 49 89 fc 48 98 48 3d ff 1f
May 19 18:40:35 proxmox kernel: RSP: 0018:ffffa8d3410a7d20 EFLAGS: 00010286
May 19 18:40:35 proxmox kernel: RAX: 0000000000000001 RBX: ffff944d9dc324c0 RCX: 0000000000000000
May 19 18:40:35 proxmox kernel: RDX: 00000000000324c0 RSI: 0000000000000000 RDI: 00000000000f424b
May 19 18:40:35 proxmox kernel: RBP: ffffa8d3410a7d40 R08: 0000000000000001 R09: 0000000000000000
May 19 18:40:35 proxmox kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000f423f
May 19 18:40:35 proxmox kernel: R13: 00000000000f424b R14: ffff944646800000 R15: 0000000000000000
May 19 18:40:35 proxmox kernel: FS:  00007fe69b4ce540(0000) GS:ffff944d9dc00000(0000) knlGS:0000000000000000
May 19 18:40:35 proxmox kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 19 18:40:35 proxmox kernel: CR2: 00000000000f424b CR3: 000000010e1ae005 CR4: 00000000003706f0
May 19 18:40:35 proxmox kernel: note: watchdog-mux[518] exited with irqs disabled
May 19 18:40:35 proxmox kernel: watchdog: watchdog0: watchdog did not stop!
 
Try to disable SMP / HyperThreading and Powersaving (C-State) in the BIOS.
 
Last edited:
Try to disable SMP / HyperThreading and Powersaving (C-State) in the BIOS.
Strangely enough. I took the RAM out, put a 4GB stick in, and had no issues, so I put both 16GB sticks back in, and no more issues, for now. I don't understand because they were originally both inserted all the way. I specifically inspected that before taking them out.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!