Hello Proxmox Community,
I have a small issues where I need your help/suggestions.
I have a MI210 Grafics card which a mapped to a vm via PCIE Bridge. For the Grafics card to support rocm7.11 i had to update the firmware of the card. Therefore I had to do the update via the proxmox host.
I made the changes:
Disabled SR-IOV in the mainboard and set iomem=relaxed in the kernel command line.
Showed the card with IFWI updateable, so i run
this failed with:
Failed to flash the VBIOS.
Error 0x8000ffff : An unexpected error occured
Then I tried to recover from this by doing a reboot and run
but this failed aswell with:
Because it tries to backup the bricked VBIOS.
As anyone had experience with this and knows what can be done to get the card back working with the actual firmware?
Is there something blcking the firmwar eupdater still? can is use the amdvbflash tool to recover from this situation?
Here some more information:
full kernel cmdline:
drivers (amdgpu/radeon) are blacklisted in /etc/modprobe.d/blacklist.conf
this i get when running amdfwflashtool --list-devices :
lspci -v -s 03:00.0 gives:
I have a small issues where I need your help/suggestions.
I have a MI210 Grafics card which a mapped to a vm via PCIE Bridge. For the Grafics card to support rocm7.11 i had to update the firmware of the card. Therefore I had to do the update via the proxmox host.
I made the changes:
Disabled SR-IOV in the mainboard and set iomem=relaxed in the kernel command line.
Code:
/opt/amdfwflash/sbin/amdfwflash --list-devices
Showed the card with IFWI updateable, so i run
Code:
/opt/amdfwflash/sbin/amdfwflash -u
this failed with:
Failed to flash the VBIOS.
Error 0x8000ffff : An unexpected error occured
Then I tried to recover from this by doing a reboot and run
Code:
/opt/amdfwflash/sbin/amdfwflash -r
but this failed aswell with:
Code:
Detecting AMD GPU/APU. Please wait...
FAILURE: IFWI of MI200 at 0000:03:00.0 with size 0x0 bytes could not be saved in "/tmp/amdfwflash/ifwi/backup"
Failed to save the VBIOS for 1 or more ASIC.
Error 0x80004005 : Generic failure
As anyone had experience with this and knows what can be done to get the card back working with the actual firmware?
Is there something blcking the firmwar eupdater still? can is use the amdvbflash tool to recover from this situation?
Here some more information:
full kernel cmdline:
Code:
root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt iomem=relaxed
this i get when running amdfwflashtool --list-devices :
Code:
AMD Firmware Flash Tool Version 2.0.701.460-Public. Copyright© 2022-2025 Advanced Micro Devices, Inc. All rights reserved.
Detecting AMD GPU/APU. Please wait...
IFWI RMFW
Update Update
SNo BDF DID ASIC SPIROM Size Test BIOS P/N Available Available
___ ____________ ____ _________ ___________ ________ ____ _________________ _________ _________
0 0000:03:00.0 740f MI200 0x0 Pass Unknown Unknown Unknown
lspci -v -s 03:00.0 gives:
Code:
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran/MI200 [Instinct MI210] (rev 02)
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0c34
Flags: fast devsel, IRQ 255, NUMA node 0, IOMMU group 34
Memory at 21000000000 (64-bit, prefetchable) [disabled] [size=64G]
Memory at 22000000000 (64-bit, prefetchable) [disabled] [size=2M]
I/O ports at 1000 [disabled] [size=256]
Expansion ROM at <ignored> [disabled]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, IntMsgNum 0
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable- Count=4 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [240] Power Budgeting <?>
Capabilities: [250] Dynamic Power Allocation <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
Capabilities: [330] Single Root I/O Virtualization (SR-IOV)
Capabilities: [410] Physical Layer 16.0 GT/s <?>
Capabilities: [450] Lane Margining at the Receiver
Capabilities: [580] Vendor Specific Information: ID=0002 Rev=5 Len=174 <?>
Kernel modules: amdgpu