Can't Install Nvidia Driver

manjotsc · Jan 15, 2021

I am trying to install Nvidia driver from Nvidia UNIX drivers for GTX 1050 on Proxmox host, But I am keep getting this error.

Code:

NVRM: The NVIDIA probe routine was not called for 1 device(s).                                                                                                                   
NVRM: This can occur when a driver such as:                                                                                                                                     
NVRM: nouveau, rivafb, nvidiafb or rivatv                                                                                                                                       
NVRM: was loaded and obtained ownership of the NVIDIA device(s).                                                                                                                 
NVRM: Try unloading the conflicting kernel module (and/or                                                                                                                       
NVRM: reconfigure your kernel without the conflicting                                                                                                                           
NVRM: driver(s)), then try loading the NVIDIA kernel module                                                                                                                     
NVRM: again.                                                                                                                                                                         
NVRM: No NVIDIA devices probed.

t.lamprecht · Jan 15, 2021

Where do you install the driver from? From Debian non-free repos or downloaded from nvidia.com?

Also, did you blacklist the nouveau and nvidiafb modules through modprobe and rebooted the host?

manjotsc · Jan 15, 2021

@t.lamprecht I downloaded from nvidia.com and I did blacklist the nouveau and nvidiafb and rebooted. Still no luck

Code:

root@vms:~# nano /etc/modprobe.d/blacklist.conf


blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist vga16fb
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

manjotsc · Jan 18, 2021

@t.lamprecht any suggestions?

Thanks

idhamari · Jul 11, 2022

I have the same problem!!! Has anyone solved it after a yar and half?

manjotsc · Jul 11, 2022

Can I know the what gpu you have and and the motherboard?

idhamari · Jul 11, 2022

manjotsc said:
Can I know the what gpu you have and and the motherboard?

Nvidia RTX2080Ti Motherboard MSI MPG x570 CPU AMD Ryzen 9 3950x

idhamari · Jul 11, 2022

My Proxmox version:

Code:

pveversion

pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.39-1-pve)

This is the command and the output:

Code:

./NVIDIA-Linux-x86_64-510.54.run --no-questions --ui=none --disable-nouveau

Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 510.54..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Welcome to the NVIDIA Software Installer for Unix/Linux

Detected 24 CPUs online; setting concurrency level to 24.
Installing NVIDIA driver version 510.54.

WARNING: One or more modprobe configuration files to disable Nouveau are already present at: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf.  Please be
         sure you have rebooted your system since these files were written.  If you have rebooted, then Nouveau may be enabled for other reasons, such as being included in the system initial ramdisk or in your X
         configuration file.  Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.

For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration directory.  Would you like nvidia-installer to attempt to create this modprobe file for you? (Answer: Yes)

One or more modprobe configuration files to disable Nouveau have been written.  For some distributions, this may be sufficient to disable Nouveau; other distributions may require modification of the initial ramdisk.
Please reboot your system and attempt NVIDIA driver installation again.  Note if you later wish to re-enable Nouveau, you will need to delete these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf,
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf

Performing CC sanity check with CC="/usr/bin/cc".
Performing CC check.
Kernel source path: '/lib/modules/5.15.39-1-pve/build'
Kernel output path: '/lib/modules/5.15.39-1-pve/build'
Performing Compiler check.
Performing Dom0 check.
Performing Xen check.
Performing PREEMPT_RT check.
Performing vgpu_kvm check.
Cleaning kernel module build directory.
Building kernel modules
  : [##############################] 100%
Kernel module compilation complete.
Unable to determine if Secure Boot is enabled: No such file or directory

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the
       one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this
       system is supported by this NVIDIA Linux graphics driver release.
      
       Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

Kernel module load error: No such device
Kernel messages:
[ 3710.635753] br-mailcow: port 15(veth382f8ae) entered disabled state
[ 3710.678035] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3710.678067] IPv6: ADDRCONF(NETDEV_CHANGE): veth5512660: link becomes ready
[ 3710.678089] br-mailcow: port 16(veth5512660) entered blocking state
[ 3710.678091] br-mailcow: port 16(veth5512660) entered forwarding state
[ 3710.722155] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3710.722180] IPv6: ADDRCONF(NETDEV_CHANGE): vethaabc3dd: link becomes ready
[ 3710.722213] br-mailcow: port 17(vethaabc3dd) entered blocking state
[ 3710.722215] br-mailcow: port 17(vethaabc3dd) entered forwarding state
[ 3710.773755] eth0: renamed from veth84ecf9c
[ 3710.914224] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3710.914255] IPv6: ADDRCONF(NETDEV_CHANGE): vetha4742a0: link becomes ready
[ 3710.914290] br-mailcow: port 9(vetha4742a0) entered blocking state
[ 3710.914293] br-mailcow: port 9(vetha4742a0) entered forwarding state
[ 3809.987790] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 3809.987794] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 3809.988579] NVRM: This can occur when a driver such as:
NVRM: nouveau, rivafb, nvidiafb or rivatv
NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 3809.988580] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 3809.988581] NVRM: No NVIDIA devices probed.
[ 3809.988664] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508

ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at
       www.nvidia.com.

manjotsc · Jul 11, 2022

Try installing the driver from here, scroll down and pick the latest version https://github.com/keylase/nvidia-patch/blob/master/win/README.md

Also your gpu does it has dedicate power port or its powered thru pcie lane?

idhamari · Jul 11, 2022

manjotsc said:
Try installing the driver from here, scroll down and pick the latest version https://github.com/keylase/nvidia-patch/blob/master/win/README.md

the link is for windows so it will not work as I am using proxmox!!! I download the driver from here!

manjotsc said:
Also your gpu does it has dedicate power port or its powered thru pcie lane?

It has a power port!!!

manjotsc · Jul 11, 2022

First sorry about wrong link, they also provide drivers for linux https://github.com/keylase/nvidia-patch and second, the reason why I asked whether it has power port or not is because some gpu have two variant one with power port and some take power thru PCIE. This exact error will occur, when installing driver and it tries to initialize the gpu but it gets limited by wattage pcie can provide.

idhamari · Jul 13, 2022

@manjotsc thanks for your reply! I already tried that but it didn't work, maybe I am missing something? I already post the output!

when installing driver and it tries to initialize the gpu but it gets limited by wattage pcie can provide.

All RTX2080Ti are large GPUs with dedicated power so I don't think this is the problem. I think it is related to a conflict between proxmox kernel header and the driver but I am not sure how to solve this. e.g. when using --dkms option I get this error:

Code:

./NVIDIA-Linux-x86_64-515.57.run --dkms
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 515.57................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
/opt/nvidia# cat /var/lo
local/ lock/  log/   
/opt/nvidia# cat /var/lo
local/ lock/  log/   
/opt/nvidia# cat /var/log/
alternatives.log       auth.log.4.gz          debug                  fontconfig.log         mail.err               mail.log.4.gz          nginx/                 pve-firewall.log.4.gz  syslog.4.gz
alternatives.log.1     btmp                   debug.1                glusterfs/             mail.err.1             mail.warn              nvidia-installer.log   pve-firewall.log.5.gz  user.log
alternatives.log.2.gz  btmp.1                 debug.2.gz             journal/               mail.info              mail.warn.1            nvidia-uninstall.log   pve-firewall.log.6.gz  user.log.1
alternatives.log.3.gz  ceph/                  debug.3.gz             kern.log               mail.info.1            mail.warn.2.gz         private/               pve-firewall.log.7.gz  user.log.2.gz
alternatives.log.4.gz  chrony/                debug.4.gz             kern.log.1             mail.info.2.gz         mail.warn.3.gz         pve/                   pveproxy/              user.log.3.gz
alternatives.log.5.gz  corosync/              dpkg.log               kern.log.2.gz          mail.info.3.gz         mail.warn.4.gz         pveam.log              runit/                 user.log.4.gz
apt/                   daemon.log             dpkg.log.1             kern.log.3.gz          mail.info.4.gz         messages               pveam.log.0            samba/                 vzdump/
auth.log               daemon.log.1           dpkg.log.2.gz          kern.log.4.gz          mail.log               messages.1             pve-firewall.log       syslog                 wtmp
auth.log.1             daemon.log.2.gz        dpkg.log.3.gz          lastlog                mail.log.1             messages.2.gz          pve-firewall.log.1     syslog.1               
auth.log.2.gz          daemon.log.3.gz        dpkg.log.4.gz          letsencrypt/           mail.log.2.gz          messages.3.gz          pve-firewall.log.2.gz  syslog.2.gz            
auth.log.3.gz          daemon.log.4.gz        faillog                lxc/                   mail.log.3.gz          messages.4.gz          pve-firewall.log.3.gz  syslog.3.gz            
/opt/nvidia# cat /var/log/n
nginx/                nvidia-installer.log  nvidia-uninstall.log  
/opt/nvidia# cat /var/log/nvidia-installer.log 
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Wed Jul 13 07:40:15 2022
installer version: 515.57

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer
    --dkms

Using: nvidia-installer ncurses v6 user interface
-> Detected 24 CPUs online; setting concurrency level to 24.
-> Installing NVIDIA driver version 515.57.
-> There appears to already be a driver installed on your system (version: 515.57).  As part of installing this driver (version: 515.57), the existing driver will be uninstalled.  Are you sure you want to continue? (Answer: Continue installation)
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules'; these paths were not queryable from the system.  If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
-> Install NVIDIA's 32-bit compatibility libraries? (Answer: Yes)
-> Uninstalling the previous installation with /usr/bin/nvidia-uninstall.
Looking for install checker script at ./libglvnd_install_checker/check-libglvnd-install.sh
   executing: '/bin/sh ./libglvnd_install_checker/check-libglvnd-install.sh'...
   Found libglvnd libraries: libGLESv2.so.2 libGLESv1_CM.so.1 libOpenGL.so.0 libEGL.so.1 libGLX.so.0 libGL.so.1
   Found non-libglvnd libraries:
   Missing libraries:
   libglvnd appears to be installed.
Will not install libglvnd libraries.
-> Skipping GLVND file: "libOpenGL.so.0"
-> Skipping GLVND file: "libOpenGL.so"
-> Skipping GLVND file: "libGLESv1_CM.so.1.2.0"
-> Skipping GLVND file: "libGLESv1_CM.so.1"
-> Skipping GLVND file: "libGLESv1_CM.so"
-> Skipping GLVND file: "libGLESv2.so.2.1.0"
-> Skipping GLVND file: "libGLESv2.so.2"
-> Skipping GLVND file: "libGLESv2.so"
-> Skipping GLVND file: "libGLdispatch.so.0"
-> Skipping GLVND file: "libGLX.so.0"
-> Skipping GLVND file: "libGLX.so"
-> Skipping GLVND file: "libGL.so.1.7.0"
-> Skipping GLVND file: "libGL.so.1"
-> Skipping GLVND file: "libGL.so"
-> Skipping GLVND file: "libEGL.so.1.1.0"
-> Skipping GLVND file: "libEGL.so.1"
-> Skipping GLVND file: "libEGL.so"
-> Skipping GLVND file: "./32/libOpenGL.so.0"
-> Skipping GLVND file: "libOpenGL.so"
-> Skipping GLVND file: "./32/libGLdispatch.so.0"
-> Skipping GLVND file: "./32/libGLESv2.so.2.1.0"
-> Skipping GLVND file: "libGLESv2.so.2"
-> Skipping GLVND file: "libGLESv2.so"
-> Skipping GLVND file: "./32/libGLESv1_CM.so.1.2.0"
-> Skipping GLVND file: "libGLESv1_CM.so.1"
-> Skipping GLVND file: "libGLESv1_CM.so"
-> Skipping GLVND file: "./32/libGL.so.1.7.0"
-> Skipping GLVND file: "libGL.so.1"
-> Skipping GLVND file: "libGL.so"
-> Skipping GLVND file: "./32/libGLX.so.0"
-> Skipping GLVND file: "libGLX.so"
-> Skipping GLVND file: "./32/libEGL.so.1.1.0"
-> Skipping GLVND file: "libEGL.so.1"
-> Skipping GLVND file: "libEGL.so"
Will install libEGL vendor library config file to /usr/share/glvnd/egl_vendor.d
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (515.57):
   executing: '/usr/sbin/ldconfig'...
   executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

idhamari · Jul 14, 2022

ssechao_weytop.com · Sep 5, 2022

Does anyone succeed to install nvidia driver ?

trio198 · Sep 5, 2022

ssechao_weytop.com said:
Does anyone succeed to install nvidia driver ?

In host machine you mean?
I managed to install driver by following this guide :

https://gitlab.com/polloloco/vgpu-proxmox
nvidia-smi, nvidia-smi vgpu, and mdevctl types also works.

My setup :
Proxmox VE 7.2
Asus sWRX80e + AMD Threadripper
2 x RTX 3090 Ti
Nvidia GRID Driver 510.85.03 patched with vgpu

Managed to install Nvidia GRID driver in Windows 10 VM Guest, but got code 43.

ssechao_weytop.com · Sep 5, 2022

trio198 said:
In host machine you mean?
I managed to install driver by following this guide :

https://gitlab.com/polloloco/vgpu-proxmox
nvidia-smi, nvidia-smi vgpu, and mdevctl types also works.

My setup :
Proxmox VE 7.2
Asus sWRX80e + AMD Threadripper
2 x RTX 3090 Ti
Nvidia GRID Driver 510.85.03 patched with vgpu

Managed to install Nvidia GRID driver in Windows 10 VM Guest, but got code 43.

Hi trio198,

Thank you for your reply,

Yes I mean in the host.

I have followed the same guide.
When i try to install the custom firmware. No matter which pve-kernel use, I always end up with this message
" ERROR: Unable to load the 'nvidia-vgpu-vfio' kernel module. "

with
#dmesg
[ 187.506523] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 187.506527] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[ 187.507442] NVRM: This can occur when a driver such as:
NVRM: nouveau, rivafb, nvidiafb or rivatv
NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 187.507443] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 187.507443] NVRM: No NVIDIA devices probed.

I am sure, there no module nouveau, rivagfbn nvidiafb, rivatv, etc.. loaded on the system

I don't know if this thing could make a big difference or important but i noticed different output for the DMAR/IOMMU depending on the kernel.

Here is when i boot up with pve kernel 5.15-39-4-pve
~#dmesg | grep -e DMAR -e IOMMU
[ 0.007409] ACPI: DMAR 0x000000001EF71630 0000B0 (v01 INTEL KBL 00000001 INTL 00000001)
[ 0.007430] ACPI: Reserving DMAR table memory at [mem 0x1ef71630-0x1ef716df]
[ 0.037687] DMAR: IOMMU enabled
[ 0.126116] DMAR: Host address width 39
[ 0.126189] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.126268] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[ 0.126363] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.126440] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[ 0.126533] DMAR: RMRR base: 0x0000001de84000 end: 0x0000001dea3fff
[ 0.126612] DMAR: RMRR base: 0x00000028400000 end: 0x0000002cbfffff
[ 0.126690] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.126768] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.126845] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.128377] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.372261] DMAR: [Firmware Bug]: RMRR entry for device 6f:00.0 is broken - applying workaround
[ 0.372356] DMAR: No ATSR found
[ 0.372504] DMAR: No SATC found
[ 0.372578] DMAR: IOMMU feature fl1gp_support inconsistent
[ 0.372578] DMAR: IOMMU feature pgsel_inv inconsistent
[ 0.372655] DMAR: IOMMU feature nwfs inconsistent
[ 0.372730] DMAR: IOMMU feature pasid inconsistent
[ 0.372806] DMAR: IOMMU feature eafs inconsistent
[ 0.372881] DMAR: IOMMU feature prs inconsistent
[ 0.372956] DMAR: IOMMU feature nest inconsistent
[ 0.373031] DMAR: IOMMU feature mts inconsistent
[ 0.373106] DMAR: IOMMU feature sc_support inconsistent
[ 0.373181] DMAR: IOMMU feature dev_iotlb_support inconsistent
[ 0.373258] DMAR: dmar0: Using Queued invalidation
[ 0.373411] DMAR: dmar1: Using Queued invalidation
[ 0.375267] DMAR: Intel(R) Virtualization Technology for Directed I/O

and with pve kernel 5.11.22-7-pve
~# dmesg | grep -e DMAR -e IOMMU
[ 0.007096] ACPI: DMAR 0x000000001EF71630 0000B0 (v01 INTEL KBL 00000001 INTL 00000001)
[ 0.007117] ACPI: Reserving DMAR table memory at [mem 0x1ef71630-0x1ef716df]
[ 0.036884] DMAR: IOMMU enabled
[ 0.114071] DMAR: Host address width 39
[ 0.114145] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.114223] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[ 0.114317] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.114395] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[ 0.114488] DMAR: RMRR base: 0x0000001de84000 end: 0x0000001dea3fff
[ 0.114566] DMAR: RMRR base: 0x00000028400000 end: 0x0000002cbfffff
[ 0.114644] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.114722] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.114799] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.116267] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.808898] DMAR: [Firmware Bug]: RMRR entry for device 6f:00.0 is broken - applying workaround
[ 0.808993] DMAR: No ATSR found
[ 0.809066] DMAR: dmar0: Using Queued invalidation
[ 0.809143] DMAR: dmar1: Using Queued invalidation
[ 0.810992] DMAR: Intel(R) Virtualization Technology for Directed I/O

Here is my system config

Proxmox VE 7.2
Prime Z270-A with Intel i7700K
1 x RTX 3090 FE
Nvidia GRID Driver 510.85.03 patched with vgpu

Any help, would be appreciate..

trio198 · Sep 8, 2022

ssechao_weytop.com said:
Hi trio198,

Thank you for your reply,

Yes I mean in the host.

I have followed the same guide.
When i try to install the custom firmware. No matter which pve-kernel use, I always end up with this message
" ERROR: Unable to load the 'nvidia-vgpu-vfio' kernel module. "

Here is my system config

Proxmox VE 7.2
Prime Z270-A with Intel i7700K
1 x RTX 3090 FE
Nvidia GRID Driver 510.85.03 patched with vgpu

Any help, would be appreciate..

You need GPU Passthrough or also with VGPU?
VGPU still doesn't support Ampere (RTX 30 series) but GPU Passthrough works.

ssechao_weytop.com · Sep 8, 2022

ssechao_weytop.com said:
Hi trio198,

Thank you for your reply,

Yes I mean in the host.

I have followed the same guide.
When i try to install the custom firmware. No matter which pve-kernel use, I always end up with this message
" ERROR: Unable to load the 'nvidia-vgpu-vfio' kernel module. "

with
#dmesg
[ 187.506523] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 187.506527] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[ 187.507442] NVRM: This can occur when a driver such as:
NVRM: nouveau, rivafb, nvidiafb or rivatv
NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 187.507443] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 187.507443] NVRM: No NVIDIA devices probed.

I am sure, there no module nouveau, rivagfbn nvidiafb, rivatv, etc.. loaded on the system

I don't know if this thing could make a big difference or important but i noticed different output for the DMAR/IOMMU depending on the kernel.

Here is when i boot up with pve kernel 5.15-39-4-pve
~#dmesg | grep -e DMAR -e IOMMU
[ 0.007409] ACPI: DMAR 0x000000001EF71630 0000B0 (v01 INTEL KBL 00000001 INTL 00000001)
[ 0.007430] ACPI: Reserving DMAR table memory at [mem 0x1ef71630-0x1ef716df]
[ 0.037687] DMAR: IOMMU enabled
[ 0.126116] DMAR: Host address width 39
[ 0.126189] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.126268] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[ 0.126363] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.126440] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[ 0.126533] DMAR: RMRR base: 0x0000001de84000 end: 0x0000001dea3fff
[ 0.126612] DMAR: RMRR base: 0x00000028400000 end: 0x0000002cbfffff
[ 0.126690] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.126768] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.126845] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.128377] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.372261] DMAR: [Firmware Bug]: RMRR entry for device 6f:00.0 is broken - applying workaround
[ 0.372356] DMAR: No ATSR found
[ 0.372504] DMAR: No SATC found
[ 0.372578] DMAR: IOMMU feature fl1gp_support inconsistent
[ 0.372578] DMAR: IOMMU feature pgsel_inv inconsistent
[ 0.372655] DMAR: IOMMU feature nwfs inconsistent
[ 0.372730] DMAR: IOMMU feature pasid inconsistent
[ 0.372806] DMAR: IOMMU feature eafs inconsistent
[ 0.372881] DMAR: IOMMU feature prs inconsistent
[ 0.372956] DMAR: IOMMU feature nest inconsistent
[ 0.373031] DMAR: IOMMU feature mts inconsistent
[ 0.373106] DMAR: IOMMU feature sc_support inconsistent
[ 0.373181] DMAR: IOMMU feature dev_iotlb_support inconsistent
[ 0.373258] DMAR: dmar0: Using Queued invalidation
[ 0.373411] DMAR: dmar1: Using Queued invalidation
[ 0.375267] DMAR: Intel(R) Virtualization Technology for Directed I/O

and with pve kernel 5.11.22-7-pve
~# dmesg | grep -e DMAR -e IOMMU
[ 0.007096] ACPI: DMAR 0x000000001EF71630 0000B0 (v01 INTEL KBL 00000001 INTL 00000001)
[ 0.007117] ACPI: Reserving DMAR table memory at [mem 0x1ef71630-0x1ef716df]
[ 0.036884] DMAR: IOMMU enabled
[ 0.114071] DMAR: Host address width 39
[ 0.114145] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.114223] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[ 0.114317] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.114395] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[ 0.114488] DMAR: RMRR base: 0x0000001de84000 end: 0x0000001dea3fff
[ 0.114566] DMAR: RMRR base: 0x00000028400000 end: 0x0000002cbfffff
[ 0.114644] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.114722] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.114799] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.116267] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.808898] DMAR: [Firmware Bug]: RMRR entry for device 6f:00.0 is broken - applying workaround
[ 0.808993] DMAR: No ATSR found
[ 0.809066] DMAR: dmar0: Using Queued invalidation
[ 0.809143] DMAR: dmar1: Using Queued invalidation
[ 0.810992] DMAR: Intel(R) Virtualization Technology for Directed I/O

Here is my system config

Proxmox VE 7.2
Prime Z270-A with Intel i7700K
1 x RTX 3090 FE
Nvidia GRID Driver 510.85.03 patched with vgpu

Any help, would be appreciate..

I only need GPU slicing into vGPU then passthrough to VMs

trio198 said:
You need GPU Passthrough or also with VGPU?
VGPU still doesn't support Ampere (RTX 30 series) but GPU Passthrough works.

I need to slice the RTX 3090 FE GPU slicing into vGPU then passthrough each one of them to VMs as mdev pci devices.
I previously succeeded making this recognized by the system with 2xA100 40Go PCIE GPU and PVE 6.3.6-1. A100 are Ampere architecture GPU.

But i am facing with the following issue:

nvidia-vgpu-mgr[17026]: cmd: 0x2080014b failed.
.
.
.
nvidia-vgpu-mgr[17026]: cmd: 0x20801322 failed.
.
.
.
nvidia-vgpu-mgr[17026]: error: vmiop_log: (0x0): Virtual Compute Server vGPUs not supported.
nvidia-vgpu-mgr[17026]: notice: vmiop_log: (0x0): Guest driver unloaded!

Any idea on how to make this working ?

trio198 · Sep 9, 2022

ssechao_weytop.com said:
I only need GPU slicing into vGPU then passthrough to VMs

I need to slice the RTX 3090 FE GPU slicing into vGPU then passthrough each one of them to VMs as mdev pci devices.
I previously succeeded making this recognized by the system with 2xA100 40Go PCIE GPU and PVE 6.3.6-1. A100 are Ampere architecture GPU.

But i am facing with the following issue:

nvidia-vgpu-mgr[17026]: cmd: 0x2080014b failed.
.
.
.
nvidia-vgpu-mgr[17026]: cmd: 0x20801322 failed.
.
.
.
nvidia-vgpu-mgr[17026]: error: vmiop_log: (0x0): Virtual Compute Server vGPUs not supported.
nvidia-vgpu-mgr[17026]: notice: vmiop_log: (0x0): Guest driver unloaded!

Any idea on how to make this working ?

This is exactly the same as what I'm trying to achieve, but in my case, I've managed to install both vgpu patched Host Driver and GRID or regular Quadro Driver on guest VM (Windows 10).

But then get error code 43.

Can't Install Nvidia Driver

Member

Proxmox Staff Member

Member

Member

New Member

Member

New Member

New Member

Member

New Member

Member

New Member

New Member

Member

New Member

Member

New Member

Member

New Member