Creating VM Blows Up Proxmox After Reboot

Speakerrob

Member
May 23, 2019
13
1
8
38
I've been struggling with this for months, and I'm unable to find a solution.

When I install Proxmox everything is fine. I do a bunch of initial setup type tasks (e.g. ssh config, setting up keys, users, etc.). However, whenever I go through the Web gui to create my first VM and reboot Proxmox, it corrupts the filesystem.

I don't know how to grab any logs because I have to completely wipe the machine to regain control. However, upon reboot it gets to the pve login screen and then starts eternally spamming the screen with:
Code:
systemd-journald(443): Failed to write entry
EXT4-fs error (device dm-2): __ext4_find_entry
Buffer I/O error on dev dm-2, logical block 0, lost sync page write
When I create the VM I'm using mostly defaults, it's a pfsense ISO, with VirtIO block, no networking and then adding two PCI devices (NICs) through the hardware setting after creation. It's on a separate volume group on a different physical disk to the Proxmox install itself. I'm able to start and run this VM fine until Proxmox is rebooted.
 
Hi,
you could setup a remote syslog server to see what is happening the moment you create the VM.
 
Can you show us the IOMMU groups with this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done. And also show the output of cat /proc/cmdline? Please do all this without using pcie_acs_override.
It would also be usefull to see the contents of the VM configuration file from the /etc/pve/qemu-server/ directory.
Is there any information in journalctl from starting the VM and just before the crash?
 
  • Like
Reactions: Speakerrob
do you mean pci passthrough? my guess is that the nic maybe shares an iommu group with the disk controller responsible for you host?
I think so? After creating the VM from the web gui I click on the pve node > 100 (pfSense) > hardware > Add > PCI Device > select severl NICs from hardware.


Can you show us the IOMMU groups with this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done. And also show the output of cat /proc/cmdline? Please do all this without using pcie_acs_override.
It would also be usefull to see the contents of the VM configuration file from the /etc/pve/qemu-server/ directory.
Is there any information in journalctl from starting the VM and just before the crash?
I ran these after creating the VM. Also, I'm not sure what pcie_acs_override is, or if I'm using it.
It looks like Proxmox is fine so long as I don' mark the VM to start at boot. The VM is still toast though.

IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dum
my Host Bridge [1022:1452]
IOMMU group 10 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU group 10 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 11 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 0 [1022:1460]
IOMMU group 11 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 1 [1022:1461]
IOMMU group 11 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 2 [1022:1462]
IOMMU group 11 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 3 [1022:1463]
IOMMU group 11 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 4 [1022:1464]
IOMMU group 11 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 5 [1022:1465]
IOMMU group 11 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 6 [1022:1466]
IOMMU group 11 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fa
bric: Device 18h; Function 7 [1022:1467]
IOMMU group 12 03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset USB 3.1 xHCI
Controller [1022:43bb] (rev 02)
IOMMU group 12 03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset SATA Control
ler [1022:43b7] (rev 02)
IOMMU group 12 03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b2] (rev 02)
IOMMU group 12 1d:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:4
3b4] (rev 02)
IOMMU group 12 1d:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:4
3b4] (rev 02)
IOMMU group 12 1d:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:4
3b4] (rev 02)
IOMMU group 12 1e:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express
Gigabit Ethernet Controller [10ec:8168] (rev 06)
IOMMU group 12 1f:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express
Gigabit Ethernet Controller [10ec:8168] (rev 11)
IOMMU group 12 22:00.0 PCI bridge [0604]: Microsemi / PMC / IDT PES12N3A 12-lane 3-Port PCI Express Switch [111d:
8018] (rev 0e)
IOMMU group 12 23:02.0 PCI bridge [0604]: Microsemi / PMC / IDT PES12N3A 12-lane 3-Port PCI Express Switch [111d:
8018] (rev 0e)
IOMMU group 12 23:04.0 PCI bridge [0604]: Microsemi / PMC / IDT PES12N3A 12-lane 3-Port PCI Express Switch [111d:
8018] (rev 0e)
IOMMU group 12 24:00.0 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller
(Copper) [8086:10bc] (rev 06)
IOMMU group 12 24:00.1 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller
(Copper) [8086:10bc] (rev 06)
IOMMU group 12 25:00.0 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller
(Copper) [8086:10bc] (rev 06)
IOMMU group 12 25:00.1 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller
(Copper) [8086:10bc] (rev 06)
IOMMU group 13 26:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (
rev a1)
IOMMU group 13 26:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1
)
IOMMU group 14 27:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Ra
ven2 PCIe Dummy Function [1022:145a]
IOMMU group 15 27:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0f
h) Platform Security Processor [1022:1456]
IOMMU group 16 27:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB
3.0 Host Controller [1022:145c]
IOMMU group 17 28:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir P
CIe Dummy Function [1022:1455]
IOMMU group 18 28:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
[1022:7901] (rev 51)
IOMMU group 19 28:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Aud
io Controller [1022:1457]
IOMMU group 1 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP
Bridge [1022:1453]
IOMMU group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dum
my Host Bridge [1022:1452]
IOMMU group 3 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dum
my Host Bridge [1022:1452]
IOMMU group 4 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP
Bridge [1022:1453]
IOMMU group 5 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dum
my Host Bridge [1022:1452]
IOMMU group 6 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dum
my Host Bridge [1022:1452]
IOMMU group 7 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal
PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU group 8 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dum
my Host Bridge [1022:1452]
IOMMU group 9 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal
PCIe GPP Bridge 0 to Bus B [1022:1454]

BOOT_IMAGE=/boot/vmlinuz-5.11.22-5-pve root=/dev/mapper/pve-root ro quiet

boot: order=virtio0;ide2
cores: 2
hostpci0: 0000:24:00
hostpci1: 0000:25:00
hostpci2: 0000:1f:00
ide2: local:iso/pfSense-CE-2.5.2-RELEASE-amd64.iso,media=cdrom
memory: 1024
name: pfSense
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=622c64a0-e053-4904-a784-7844429e06e9
sockets: 1
virtio0: vms:vm-100-disk-0,size=32G
vmgenid: 41c7e046-a39f-44ed-b143-6cdaaf1cf813

Don't really know what I'm looking for in journalctl, but there are a lot of entries in there.
 
You are passing through three devices from IOMMU group 12:
IOMMU group 12 1f:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 11) IOMMU group 12 24:00.0 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller (Copper) [8086:10bc] (rev 06) IOMMU group 12 24:00.1 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller (Copper) [8086:10bc] (rev 06) IOMMU group 12 25:00.0 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller (Copper) [8086:10bc] (rev 06) IOMMU group 12 25:00.1 Ethernet controller [0200]: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller (Copper) [8086:10bc] (rev 06)

But there are other devices in group 12 (I left out the PCI bridges, as they are not important here):
IOMMU group 12 03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset USB 3.1 xHCI Controller [1022:43bb] (rev 02) IOMMU group 12 03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset SATA Controller [1022:43b7] (rev 02) IOMMU group 12 1e:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)

IOMMU groups cannot be shared between VMs or between a VM and the host. As soon as you start the VM, the Proxmox host looses the USB, another Realtek ethernet and a SATA controller. It looks like your Promox is installed on one or more of the drives connected to that SATA controller, which is the reason for the Buffer I/O error and Proxmox crashing.

Please try moving the thtrr ethernet controllers that you want to passthrough to other PCI(e) slots, until they are in a separate IOMMU group.
Can you tell me exactly which AMD motherboard you are using, then I can maybe lookup the specification and advise a specific slot?
 
Last edited:
IOMMU is something I've never come across before, did a little reading on it and my problems are starting to make more sense.

I'm using an ASRock AB350M Pro4 motherboard. Unfortunately, the NVME drive Proxmox is on seems to share the same group as the motherboard's ethernet controller, so that essentially removes those ports from the equation. Moving the Intel NIC down a few slots shouldn't be an issue, though.
 
Looks like everything shares group 12, so passthrough is not going to be an option with this board.
I suppose I can accomplish the same goal with vlans in Proxmox.
 
The x16 slot closest to the CPU ought to be in a separate IOMMU group, although it might be wired as only x8 if you are using an APU.
The x1 slot will definitely be connected to the chipset (like the USB and SATA) and be part of that large group 12.
I'm not sure about the second x16 slot (which is only wired for x4). But if the M.2 slot is also connected to the chipset, as you are suggesting, maybe these 4 PCIe lanes are also connected to the CPU.
(Are you sure your M.2 drive is an NVMe drive? It looks like a SATA drive (in the same flat M.2 format), otherwise it should show up as a Non-Volatile memory controller.)
 
  • Like
Reactions: Speakerrob
Looks like everything shares group 12, so passthrough is not going to be an option with this board.
I suppose I can accomplish the same goal with vlans in Proxmox.
Have you tried moving the Proxmox drive to the other M.2 slot (closer to the CPU)? Maybe you can pass the whole IOMMU group 12 to the VM, if you don't need the USB, SATA (which is probably the M.2 slot) and other ethernet controllers.
Or if you don't run untrusted software or allow untrusted users, you could consider using pcie_acs_override.
 
You're correct, I misremembered my hardware setup. It is just a SATA SSD.

I was able to move the Intel NIC to the second PCI slot and it's now split between IOMMU groups 14 and 15. It is no throwing the corrupted file system IO errors!

Thank you very much for your time and sharing your knowledge.
 
  • Like
Reactions: leesteken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!