Updated to 8.2 - DMA error

Either you ask the Proxmox maintainers directly (this is the community forum and I doubt, that the actual developers will read here often), or you go ahead, check out the actual kernel source, through git, and do the diff yourself. Main question being: Would you be able to tell the important change?

This issue you are seeing most likely is a BIOS issue anyway. Probably a resource definition for vt-d (or something of the likes) is misconfigured BIOS-wise, which then leads to the observed issues. These kind of BIOS issues are not uncommon.
And if I saw this correctly, then we are talking about Sandybridge based Xeon E5-2600 series CPUs. For hardware that old, full virtualization support was probably never tested much. So even more reason for bugs in the BIOS.
But of course, feel free to look into the SW more. Obviously there is a lot of HW out there, that shows these kind of problems.
 
Last edited:
this is the community forum and I doubt, that the actual developers will read here often
Oh, so I don't count as an actual developer? In fact, most of the devs do read and answer in the forum regularly ;)
 
Last edited:
  • Like
Reactions: ThoSo
Agreed, I am just not 100% sure this is my issue. From reading the threads it sounds like this change was made in kernel 6.8. I have tested all the way upto 6.8.12-18 and it I have no issues. But if I try 6.8.12-29, it crashes after some time. I actualy was able to update my bios to the newest version. It ran much longer but its still crashed.

I am interested in the difference between 18 and 29.
Same here: With update to 6.8.12-29-pve i'm seeing many "DMAR: ERROR: DMA PTE for vPFN" messages in journal. After some time it crashes/gets killed by fencing. Getting back to 6.8.12-22-pve solves the issue. My HW is "Dell PowerEdge R740" with "MegaRAID SAS controller".

To me it looks a regression between 6.8.12-22-pve and 6.8.12-29-pve,reproducible on multiple Dell R740 nodes with megaraid_sas controller.

The part of the log showing one of the kernel messeges:

Jun 04 03:14:30 clrz19-17 kernel: DMAR: ERROR: DMA PTE for vPFN 0x50700 already set (to 50700003 not 396de8001)
Jun 04 03:14:30 clrz19-17 kernel: ------------[ cut here ]------------
Jun 04 03:14:30 clrz19-17 kernel: WARNING: CPU: 75 PID: 2165752 at drivers/iommu/intel/iommu.c:2231 __domain_mapping+0x30c/0x510
Jun 04 03:14:30 clrz19-17 kernel: Modules linked in: veth rbd xt_mac ceph libceph rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_physdev xt_addrtype xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_tcpudp xt_set xt_mark iptable_filter ip_set_hash_net ip_set sctp scsi_dh_alua dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_tables libcrc32c vxlan ip6_udp_tunnel udp_tunnel 8021q garp mrp iTCO_wdt intel_pmc_bxt iTCO_vendor_support sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac skx_edac_common nfit x86_pkg_temp_thermal ipmi_ssif intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd cmdlinepart dell_smbios rapl
Jun 04 03:14:30 clrz19-17 kernel: dcdbas spi_nor mgag200 mei_me intel_cstate dell_wmi_descriptor wmi_bmof pcspkr i2c_algo_bit mtd acpi_power_meter mei intel_pch_thermal ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler mac_hid cdc_ether usbnet mii zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap dm_multipath efi_pstore dmi_sysfs ip_tables x_tables autofs4 xhci_pci xhci_pci_renesas crc32_pclmul i40e ahci i2c_i801 spi_intel_pci xhci_hcd tg3 megaraid_sas spi_intel lpc_ich i2c_smbus libahci wmi
Jun 04 03:14:30 clrz19-17 kernel: CPU: 75 PID: 2165752 Comm: kworker/u194:4 Tainted: P O 6.8.12-29-pve #1
Jun 04 03:14:30 clrz19-17 kernel: Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS 2.24.0 03/27/2025
Jun 04 03:14:30 clrz19-17 kernel: Workqueue: writeback wb_workfn (flush-252:1)
Jun 04 03:14:30 clrz19-17 kernel: RIP: 0010:__domain_mapping+0x30c/0x510
Jun 04 03:14:30 clrz19-17 kernel: Code: 48 89 c2 4c 89 5d b0 48 c7 c7 80 38 44 95 e8 eb 31 6c ff 8b 05 89 ad 9c 01 4c 8b 5d b0 85 c0 74 09 83 e8 01 89 05 78 ad 9c 01 <0f> 0b e9 f3 fe ff ff 41 80 e2 7f e9 d1 fe ff ff 41 83 c8 01 49 63
Jun 04 03:14:30 clrz19-17 kernel: RSP: 0018:ffffd27489297028 EFLAGS: 00010202
Jun 04 03:14:30 clrz19-17 kernel: RAX: 0000000000000004 RBX: 0000000000000001 RCX: 0000000000000000
Jun 04 03:14:30 clrz19-17 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jun 04 03:14:30 clrz19-17 kernel: RBP: ffffd274892970c0 R08: 0000000000000000 R09: 0000000000000000
Jun 04 03:14:30 clrz19-17 kernel: R10: 0000000000000000 R11: 0000000000000002 R12: ffff8d2a92f95700
Jun 04 03:14:30 clrz19-17 kernel: R13: ffff8d2a9253c800 R14: 0000000396de8001 R15: ffff8d2a9253c800
Jun 04 03:14:30 clrz19-17 kernel: FS: 0000000000000000(0000) GS:ffff8ea6bf880000(0000) knlGS:0000000000000000
Jun 04 03:14:30 clrz19-17 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 04 03:14:30 clrz19-17 kernel: CR2: 000073c9c865e0a0 CR3: 0000001557836002 CR4: 00000000007726f0
Jun 04 03:14:30 clrz19-17 kernel: PKRU: 55555554
Jun 04 03:14:30 clrz19-17 kernel: Call Trace:
Jun 04 03:14:30 clrz19-17 kernel: <TASK>
Jun 04 03:14:30 clrz19-17 kernel: intel_iommu_map_pages+0xe1/0x140
Jun 04 03:14:30 clrz19-17 kernel: __iommu_map+0x11e/0x280
Jun 04 03:14:30 clrz19-17 kernel: iommu_map_sg+0xbf/0x1f0
Jun 04 03:14:30 clrz19-17 kernel: iommu_dma_map_sg+0x45f/0x4f0
Jun 04 03:14:30 clrz19-17 kernel: __dma_map_sg_attrs+0x32/0xd0
Jun 04 03:14:30 clrz19-17 kernel: dma_map_sg_attrs+0xe/0x30
Jun 04 03:14:30 clrz19-17 kernel: scsi_dma_map+0x47/0x70
Jun 04 03:14:30 clrz19-17 kernel: megasas_build_and_issue_cmd_fusion+0x209/0x1890 [megaraid_sas]
Jun 04 03:14:30 clrz19-17 kernel: megasas_queue_command+0x11d/0x1b0 [megaraid_sas]
Jun 04 03:14:30 clrz19-17 kernel: scsi_queue_rq+0x3fe/0xcc0
Jun 04 03:14:30 clrz19-17 kernel: blk_mq_dispatch_rq_list+0x137/0x810
Jun 04 03:14:30 clrz19-17 kernel: ? sbitmap_get+0x73/0x180
Jun 04 03:14:30 clrz19-17 kernel: __blk_mq_sched_dispatch_requests+0x41f/0x5d0
Jun 04 03:14:30 clrz19-17 kernel: ? dd_insert_requests+0x13e/0x450
Jun 04 03:14:30 clrz19-17 kernel: ? sbitmap_get_shallow+0x68/0x140
Jun 04 03:14:30 clrz19-17 kernel: blk_mq_sched_dispatch_requests+0x2f/0x80
Jun 04 03:14:30 clrz19-17 kernel: blk_mq_run_hw_queue+0x259/0x350
Jun 04 03:14:30 clrz19-17 kernel: blk_mq_flush_plug_list.part.0+0x187/0x5c0
Jun 04 03:14:30 clrz19-17 kernel: blk_add_rq_to_plug+0x14d/0x1b0
Jun 04 03:14:30 clrz19-17 kernel: blk_mq_submit_bio+0x596/0x690
Jun 04 03:14:30 clrz19-17 kernel: __submit_bio+0xb3/0x1c0
Jun 04 03:14:30 clrz19-17 kernel: submit_bio_noacct_nocheck+0x17a/0x390
Jun 04 03:14:30 clrz19-17 kernel: submit_bio_noacct+0x1ca/0x660
Jun 04 03:14:30 clrz19-17 kernel: ? bio_add_page+0xa3/0xd0
Jun 04 03:14:30 clrz19-17 kernel: submit_bio+0xb2/0x110
Jun 04 03:14:30 clrz19-17 kernel: ext4_bio_write_folio+0x1b7/0x6f0
Jun 04 03:14:30 clrz19-17 kernel: mpage_submit_folio+0x94/0xc0
Jun 04 03:14:30 clrz19-17 kernel: mpage_map_and_submit_buffers+0x1ae/0x340
Jun 04 03:14:30 clrz19-17 kernel: ext4_do_writepages+0x770/0xe10
Jun 04 03:14:30 clrz19-17 kernel: ext4_writepages+0xb5/0x190
Jun 04 03:14:30 clrz19-17 kernel: do_writepages+0xcd/0x1f0
Jun 04 03:14:30 clrz19-17 kernel: ? wakeup_preempt+0x6b/0x80
Jun 04 03:14:30 clrz19-17 kernel: ? ttwu_do_activate+0x75/0x250
Jun 04 03:14:30 clrz19-17 kernel: __writeback_single_inode+0x44/0x370
Jun 04 03:14:30 clrz19-17 kernel: writeback_sb_inodes+0x211/0x510
Jun 04 03:14:30 clrz19-17 kernel: __writeback_inodes_wb+0x54/0x100
Jun 04 03:14:30 clrz19-17 kernel: ? queue_io+0x115/0x120
Jun 04 03:14:30 clrz19-17 kernel: wb_writeback+0x2df/0x350
Jun 04 03:14:30 clrz19-17 kernel: wb_workfn+0x368/0x4d0
Jun 04 03:14:30 clrz19-17 kernel: ? __schedule+0x433/0x1500
Jun 04 03:14:30 clrz19-17 kernel: ? add_timer+0x20/0x40
Jun 04 03:14:30 clrz19-17 kernel: process_one_work+0x182/0x3a0
Jun 04 03:14:30 clrz19-17 kernel: worker_thread+0x18b/0x330
Jun 04 03:14:30 clrz19-17 kernel: ? __pfx_worker_thread+0x10/0x10
Jun 04 03:14:30 clrz19-17 kernel: kthread+0xf2/0x120
Jun 04 03:14:30 clrz19-17 kernel: ? __pfx_kthread+0x10/0x10
Jun 04 03:14:30 clrz19-17 kernel: ret_from_fork+0x47/0x70
Jun 04 03:14:30 clrz19-17 kernel: ? __pfx_kthread+0x10/0x10
Jun 04 03:14:30 clrz19-17 kernel: ret_from_fork_asm+0x1b/0x30
Jun 04 03:14:30 clrz19-17 kernel: </TASK>
Jun 04 03:14:30 clrz19-17 kernel: ---[ end trace 0000000000000000 ]---
 
Hi @krambrod (and others),
could you test kernels between between 6.8.12-22-pve and 6.8.12-29-pve to narrow it down further?
 
Hi fiona, thanks for coming back to this.
What would you suggest I try after booting the `6.8.12-29-pve` Kernel?
I mean installing other kernels and booting into them to see if they are affected. In fact, the two interesting ones would be
Code:
apt install proxmox-kernel-6.8.12-27-pve
apt install proxmox-kernel-6.8.12-28-pve
since those come together with updates to the kernel submodules where the regression could've come in. The other versions were for single security fixes that are rather unlikely to be related to the issue.
 
  • Like
Reactions: krambrod
@slammers67 could it be that you were just lucky with -25? Because the only changes between those kernels are two targeted security fixes that should not be related to the IOMMU. From apt changelog proxmox-kernel-6.8:
Code:
proxmox-kernel-6.8 (6.8.12-25) bookworm; urgency=medium

  * cherry-pick fix for "pintheft"
  * prevent autoloading of certain network protocols via modprobe.d

 -- Proxmox Support Team <support@proxmox.com>  Wed, 20 May 2026 12:12:59 +0200

proxmox-kernel-6.8 (6.8.12-24) bookworm; urgency=medium

  * cherry-pick "net: skbuff: propagate shared-frag marker through
    frag-transfer helpers" to harden some variants of "DirtyFrag".
  * cherry-pick fix for "ssh-keysign-pwn"

 -- Proxmox Support Team <support@proxmox.com>  Fri, 15 May 2026 10:11:43 +0200

proxmox-kernel-6.8 (6.8.12-23) bookworm; urgency=medium