Page fault and DMAR errors

Isnubi

New Member
Jul 25, 2023
5
0
1
Hello, I got a problem on my Proxmox server since months.

Multiples times per day, I got some of my VM that goes into page fault, here's an extract of dmesg:
Code:
[ 2639.523165] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2639.523298] #PF: supervisor instruction fetch in kernel mode
[ 2639.523380] #PF: error_code(0x0010) - not-present page
[ 2639.523462] PGD 0 P4D 0
[ 2639.523534] Oops: 0010 [#1] SMP PTI
[ 2639.523616] CPU: 1 PID: 364 Comm: dockerd Not tainted 5.10.0-23-amd64 #1 Debian 5.10.179-1
[ 2639.523719] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[ 2639.523842] RIP: 0010:0x0
[ 2639.523901] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 2639.523984] RSP: 0018:ffffb14bc0423df8 EFLAGS: 00010293
[ 2639.524063] RAX: 0000000000000000 RBX: ffff91d28137e2a0 RCX: ffff91d36bb95d98
[ 2639.524149] RDX: ffffb14bc0423ea0 RSI: ffffb14bc0423e10 RDI: ffff91d28137e240
[ 2639.524263] RBP: ffff91d28137e290 R08: ffff91d282c54f00 R09: ffff91d28137e240
[ 2639.524350] R10: ffff91d28137e290 R11: 0000000000000000 R12: ffffb14bc0423e10
[ 2639.524437] R13: 0000000000000000 R14: ffffb14bc0423ea0 R15: ffff91d28137e240
[ 2639.524526] FS:  00007f0706269700(0000) GS:ffff91d3b7d00000(0000) knlGS:0000000000000000
[ 2639.524617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2639.524697] CR2: ffffffffffffffd6 CR3: 0000000107882000 CR4: 00000000000006e0
[ 2639.524789] Call Trace:
[ 2639.524872]  ep_scan_ready_list.constprop.0+0xab/0x1e0
[ 2639.524980]  do_epoll_wait+0x247/0x670
[ 2639.525055]  ? ep_read_events_proc+0xe0/0xe0
[ 2639.525132]  ? ep_unregister_pollwait.constprop.0+0xa0/0xa0
[ 2639.525214]  __x64_sys_epoll_pwait+0x49/0xb0
[ 2639.525292]  do_syscall_64+0x33/0x80
[ 2639.525408]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 2639.525493] RIP: 0033:0x5604bbeac46e
[ 2639.525565] Code: 48 89 6c 24 38 48 8d 6c 24 38 e8 0d 00 00 00 48 8b 6c 24 38 48 83 c4 40 c3 cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[ 2639.525749] RSP: 002b:00007f07062684d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000119
[ 2639.525838] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00005604bbeac46e
[ 2639.525923] RDX: 0000000000000080 RSI: 00007f0706268598 RDI: 0000000000000005
[ 2639.526007] RBP: 00007f0706268518 R08: 0000000000000000 R09: 0000000000000000
[ 2639.526090] R10: 0000000000000007 R11: 0000000000000246 R12: 00007f07062685a8
[ 2639.526175] R13: 0000000000000001 R14: 000000c000007520 R15: 000000c000078c00
[ 2639.526277] Modules linked in: ip_vs_rr xt_ipvs ip_vs xt_nat veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper sg evdev joydev cec serio_raw virtio_balloon virtio_console pcspkr qemu_fw_cfg button drm fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_common virtio_net net_failover virtio_scsi failover uhci_hcd ehci_hcd ata_generic usbcore psmouse ata_piix libata scsi_mod virtio_pci i2c_piix4 virtio_ring virtio usb_common floppy
[ 2639.527177] CR2: 0000000000000000
[ 2639.527340] ---[ end trace e848c583dab10e61 ]---
[ 2639.527462] RIP: 0010:0x0
[ 2639.527553] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 2639.527683] RSP: 0018:ffffb14bc0423df8 EFLAGS: 00010293
[ 2639.527807] RAX: 0000000000000000 RBX: ffff91d28137e2a0 RCX: ffff91d36bb95d98
[ 2639.530404] RDX: ffffb14bc0423ea0 RSI: ffffb14bc0423e10 RDI: ffff91d28137e240
[ 2639.532555] RBP: ffff91d28137e290 R08: ffff91d282c54f00 R09: ffff91d28137e240
[ 2639.534689] R10: ffff91d28137e290 R11: 0000000000000000 R12: ffffb14bc0423e10
[ 2639.536602] R13: 0000000000000000 R14: ffffb14bc0423ea0 R15: ffff91d28137e240
[ 2639.538502] FS:  00007f0706269700(0000) GS:ffff91d3b7d00000(0000) knlGS:0000000000000000
[ 2639.540321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2639.541942] CR2: ffffffffffffffd6 CR3: 0000000107882000 CR4: 00000000000006e0

At the same time this error appear, it create an error in the console of my server:
Code:
[ 3381.322269] DMAR: DRHD: handling fault status reg 102
[ 3381.322450] DMAR: [INTR-REMAP] Request device [02:00.0] fault index 0x2f [fault reason 0x26] Blocked an interrupt request due to source-id verification failure
1693931363267.png

I already run a memtest on my server and change defective RAM.
I disable intel_iommu in the grub config to avoid PCI Pass-Through.

I don't know from where it can come from.

Here's info of my configuration:
- Server: HP Proliant DL580 G7
- Processors : 4x Intel Xeon E7-4820
- RAM: nearly 400Gb of memory
Here's the result of pveversion -v:
Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.111-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-5
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.111-1-pve: 5.15.111-1
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.11.22-7-pve: 5.11.22-12
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
Last edited: