RTX 3090 Ti GPU Passthrough Crashing Proxmox Server

c80129b

New Member
Jan 24, 2025
2
0
1
System Configuration:

• RTX 3090 (working successfully)

• RTX 3090 Ti (causing crashes)

Storage: (NSF)

Power Supply: (e.g., 1600W Platinum)


The Issue:

I am attempting GPU passthrough for an RTX 3090 Ti on Proxmox to a virtual machine (VM). While I have successfully set up GPU passthrough with an RTX 3090 on the same system, adding the 3090 Ti to the VM causes the following behavior:

1. The Proxmox host crashes entirely (kernel panic or black screen).

2. After the crash, the host becomes unresponsive and requires a full reinstallation of Proxmox to recover.


This behavior occurs consistently whenever I try to add the 3090 Ti to the VM and starting it, even after following standard passthrough procedures.


What I’ve Tried:

1. IOMMU Configuration:

• Verified that IOMMU is enabled in BIOS.

• Checked and isolated IOMMU groups for the 3090 Ti.

• Enabled PCIe ACS Override (pcie_acs_override=downstream,multifunction) in GRUB.

2. Driver and Module Setup:

• Blacklisted nouveau and nvidia drivers.

• Bound the 3090 Ti to vfio-pci using its PCI device and audio IDs.

3. Resource Allocation:

• Verified sufficient CPU cores, memory, and PCIe resources for the VM.

4. Logs:

• After the crash, logs don’t indicate a specific issue as the system becomes completely unresponsive, making diagnosis difficult.


Questions for the Community:

1. Has anyone successfully passed through an RTX 3090 Ti on Proxmox?

2. Is the 3090 Ti known to have specific issues with GPU passthrough or the Proxmox kernel?

3. Are there any additional steps I should try to avoid host crashes when adding the 3090 Ti to the VM?


Additional Context:

• This is a mining rig/AI rig, and I need both GPUs (3090 and 3090 Ti) passed through to a single VM for computational tasks.

• The 3090 works perfectly on its own in passthrough. The issue only arises when adding the 3090 Ti.


Thank you in advance for any assistance or insights!
 
Updated Description of the Problem:

• I am using ZFS for my Proxmox host storage (rpool), and I suspect it may be contributing to the issue.
• When I attempt GPU passthrough with the RTX 3090 Ti, the server crashes completely, displaying critical errors related to ZFS (rpool has encountered an uncorrectable I/O failure and has been suspended).
• The crash leaves the host unresponsive, requiring a full reinstallation of Proxmox to recover.
 
HI,

• Checked and isolated IOMMU groups for the 3090 Ti.

• Enabled PCIe ACS Override (pcie_acs_override=downstream,multifunction) in GRUB.
together with
When I attempt GPU passthrough with the RTX 3090 Ti, the server crashes completely, displaying critical errors related to ZFS (rpool has encountered an uncorrectable I/O failure and has been suspended).
strongly indicate that the disk/HBA controller and the 3090 Ti indeed share the same hardware IOMMU group.
Thus, when you try to start the VM with the 3090 Ti passthrough'd, the host crashes as it looses access to the rpool.

pcie_acs_override=downstream,multifunction is dangerous in it's own regard due to its security implications - and can easily cause such breakage.
Essentially, it breaks up IOMMU groups w/o respect to the hardware situation. You can boot without that parameter and check the IOMMU groups, as the 3090 Ti and disk controller (or other vital PCIe devices) then will probably share the same IOMMU group.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!