[SOLVED] Lost Proxmox connection after editing a VM hardware

Helio Mendonça

Active Member
Apr 10, 2019
73
6
28
Hi
After having a working Proxmox 7.1 for more than a year, today I did something that disallow any connection to it (web and ssh) each time I reboot it. In that Proxmox I had several VMs and also a CT where I pass a DVB-S2 board to be used by Tvheadend.

All was working great but I decides today to try to pass that same board to a VM instead of the CT, and for that I stopped the CT and added to the VM's hardware, the PCI device associated with the board, and... TRAGEDY!! I lost connection after trying to start that VM!

I rebooted Proxmox but I can't connect to it (by web or ssh). In fact for some seconds I can see the Proxmox web interface but then I keep losing connection with proxmox (maybe when that VM starts).

My idea to try to restore the working Proxmox was to enter in recovery mode, and try to disallow the start of that VM by editing its .conf file but in recovery mode the /etc/pve folder is empty. Note that I was expecting to find the VM.conf file in the /etc/pve/qemu-server folder.

Can anyone PLEASE help me getting my proxmox server again? :(

Best regards
 
Last edited:
Press e in the boot menu and replace intel_iommu=on with intel_iommu=off (or similar for amd_iommu) to disable PCI(e) passthrough temporarily.
Then boot and edit the VM configuration to not automatically start and/or remove the PCI(e) passthrough.

Check your IOMMU groups with this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done. You cannot share IOMMU groups between VMs or between the VM and the Proxmox host. Probably the devices is in the same groups as your host network controller (which is why Proxmox is not reachable) and drive controllers (which probably cause it to crash).
Try putting the device in another PCIe slot. The groups are determined by your motherboard internal structure and BIOS to prevent unwanted memory access between groups. If you can tell me the make and model of the motherboard, maybe I can advise on which PCIe slot to try.
 
Last edited:
  • Like
Reactions: Helio Mendonça
Dear @leesteken
Many thanks for your hints that would probably solve the problem but meanwhile, and since the VM was defined to boot on start but nevertheless that took some seconds, I was able to ssh Proxmox just after the boot and check its .conf file:
Code:
boot: order=scsi0;ide2;net0
cores: 2
hostpci0: 0000:09:00.0
ide2: none,media=cdrom
memory: 4096
meta: creation-qemu=6.1.1,ctime=1645311212
name: dockertube
net0: virtio=0E:D9:XX:XX:XX:71,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-151-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=393a1a88-d036-4b75-xxxx-254fe1076ee4
sockets: 1
vmgenid: 5f476379-6ddb-4f45-xxxx-cd8270d17cd2
As you can see the hostpci0: 0000:09:00.0 line was the cause of the problem!

After losing the connection again, I removed the DVB board and after a second reboot, maybe because the PCI device was not present anymore, that line vanished from the .conf file, and the problem was solved.

Now, with it connected again to my server, I can check that board has the same IOMMU group of other cards namely the network and others and therefore, the problem seems now obvious:
1659287600760.png
Once again many thanks for your help, and your hints that could be usefull in a similar future situation!
 
Last edited:
If you can tell me the make and model of the motherboard, maybe I can advise on which PCIe slot to try.
Despite the problem is solved, maybe as you wrote, changing PCI slot of the DVB Board, could allow the pass-through to the VM, so here are the results of the command you suggested:
Code:
root@pve:~# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 10 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU group 10 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 11 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU group 11 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU group 11 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU group 11 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU group 11 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU group 11 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU group 11 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU group 11 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU group 12 01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black 2018/SN750 / PC SN720 NVMe SSD [15b7:5002]
IOMMU group 13 03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
IOMMU group 13 03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller [1022:43b5] (rev 02)
IOMMU group 13 03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset PCIe Upstream Port [1022:43b0] (rev 02)
IOMMU group 13 04:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU group 13 04:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU group 13 04:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU group 13 04:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU group 13 04:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU group 13 04:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU group 13 05:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1143 USB 3.1 Host Controller [1b21:1343]
IOMMU group 13 06:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
IOMMU group 13 07:00.0 Ethernet controller [0200]: Qualcomm Atheros Killer E2500 Gigabit Ethernet Controller [1969:e0b1] (rev 10)
IOMMU group 13 09:00.0 Multimedia controller [0480]: TBS Technologies DVB Tuner PCIe Card [544d:6178]
IOMMU group 14 0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1)
IOMMU group 14 0b:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
IOMMU group 15 0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P400] [10de:1cb3] (rev a1)
IOMMU group 15 0c:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
IOMMU group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU group 2 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 4 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 5 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU group 6 00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU group 7 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 8 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 8 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU group 8 11:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
IOMMU group 8 11:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
IOMMU group 8 11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]
IOMMU group 9 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU group 9 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU group 9 12:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
IOMMU group 9 12:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 9 12:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]

And here is the photo of my MB GA-AX370-Gaming K7 and the slots currently occupied:
1659295853438.png
Where:
- GPU GT710 is a graphics board since my AMD Ryzen 7 1700 don't have one embedded
- TBS6909 is the DVB-S2 board
- Quadro P400 is a GPU for hardware trasncoding

As you can see I still have 2 PCIe x1 that I could try!
I wonder if, on one of them, the IOMMU would not interfere with Proxmox.

Regards
 
As you can see I still have 2 PCIe x1 that I could try!
I wonder if, on one of them, the IOMMU would not interfere with Proxmox.
No, all x1 slots and the bottom one be part of the bit "chipset group". Only the x16 slots (working as PCIe x8) will be in separate IOMMU groups and the first M.2 slot (PCIe x4) is in a separate group. If you want better groups you need server CPU or a Ryzen motherboard with an X570 chipset. You only get separate groups for the PCIe lanes provided by the CPU which is 20 at most.

In the manual of the MB (page 33) I have an option saying:

IOMMU
Enables or disables AMD IOMMU support. (Defaut: Auto)

I wonder if this can solve the problem without any slot change!
No this will not help, but it needs to be Enabled for IOMMU to fully work on most systems. Search the forum and you'll find that everybody with Ryzen (except X570) has this issue.

You can "break up" the IOMMU groups using the pcie_acs_override=downstream,multifunction if you don't want to (re)move the GT710. This ignores the ACS and group isolation but will allow the VM with the TV tuner to access all of the Proxmox host memory (and therefore other VMs) via PCIe DMA.
 
  • Like
Reactions: Helio Mendonça
In the manual of the MB (page 33) I have an option saying:

IOMMU
Enables or disables AMD IOMMU support. (Defaut: Auto)

I wonder if this can solve the problem without any slot change!
Hi friend. I ran into the same problem last year with a gigabyte x370 board and a Ryzen 1700X CPU
My tuner is also TBS6909. In the end, I managed to move the TBS tuner to the VM only in the first PCIex16 slot.
I did a HW upgrade this year and went to x570 and ryzen 5700G
Specifically, I have an asus rog strix x570-e gaming wifi ii
Transition of TBS tuner is possible in any PCI slot
I just have a problem with the VM crashing when the tuner starts to use
Sometimes it is enough to run a TP scan in the Astra software
Other times after reboot the VM won't start and tells me that the ZSTD compressed data is corrupted, System has stopped. I remove the PCIe device from the VM and then the VM boots properly. Sometimes the VM works for 1 day but I still get the message "TBSECP driver 0000:00:10.0 i2c xfer timeout" in the console
I've been struggling with this for over a week now and it looks like it won't be stable at all. The only option is to put the TBS tuner on another HW without KVM on pure Debian. Or maybe use it in LXC.
I followed the instructions on the proxmox website and I have set "amd_iommu=on" "vfio" in /etc/modules etc...
I even installed q35 Machine
How did you solve it? Is it stable?
Well thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!