Hello all,
Environment: pve v7.4.16
System: HP Z2 G4, Intel Xeon E-2278G, Intel iGPU
Dedicated GPU: GTX 1650
I am passing through the Intel iGPU 630 to a Windows VM. That works as expected.
I am trying to now use my dedicated GPU on the host for LXC containers. This is the only device in my PCIe slots.
I went into the BIOS and checked "Integrated Video" and set my Primary Video as my GTX 1650.
I made sure each device is in its own IOMMU group. I blacklisted nouveau and installed the lasted Nvidia drivers for my card.
IOMMU groups (GPU is group 1, intel iGPU is group 2):
The installer works as expected. What doesn't work is nvidia-smi, it will either display no device found, or throw some other error like this:
Now, what does work is driver 470.199.02 and it works flawlessly, I am able to reboot, issue commands like nvidia-smi and run other applications. The issue is, the application I am trying to run wont run correctly with this older driver. It does work correctly with the latest driver which is v535.
I have tried to install multiple drivers > 470.199.02 and most of them do not work properly.
Environment: pve v7.4.16
System: HP Z2 G4, Intel Xeon E-2278G, Intel iGPU
Dedicated GPU: GTX 1650
I am passing through the Intel iGPU 630 to a Windows VM. That works as expected.
I am trying to now use my dedicated GPU on the host for LXC containers. This is the only device in my PCIe slots.
I went into the BIOS and checked "Integrated Video" and set my Primary Video as my GTX 1650.
I made sure each device is in its own IOMMU group. I blacklisted nouveau and installed the lasted Nvidia drivers for my card.
IOMMU groups (GPU is group 1, intel iGPU is group 2):
Code:
Jul 19 13:25:37 pve kernel: [ 0.567365] pci 0000:00:00.0: Adding to iommu group 0
Jul 19 13:25:37 pve kernel: [ 0.567380] pci 0000:00:01.0: Adding to iommu group 1
Jul 19 13:25:37 pve kernel: [ 0.567389] pci 0000:00:02.0: Adding to iommu group 2
Jul 19 13:25:37 pve kernel: [ 0.567400] pci 0000:00:12.0: Adding to iommu group 3
Jul 19 13:25:37 pve kernel: [ 0.567417] pci 0000:00:14.0: Adding to iommu group 4
Jul 19 13:25:37 pve kernel: [ 0.567425] pci 0000:00:14.2: Adding to iommu group 4
Jul 19 13:25:37 pve kernel: [ 0.567436] pci 0000:00:16.0: Adding to iommu group 5
Jul 19 13:25:37 pve kernel: [ 0.567444] pci 0000:00:17.0: Adding to iommu group 6
Jul 19 13:25:37 pve kernel: [ 0.567461] pci 0000:00:1b.0: Adding to iommu group 7
Jul 19 13:25:37 pve kernel: [ 0.567478] pci 0000:00:1b.4: Adding to iommu group 8
Jul 19 13:25:37 pve kernel: [ 0.567489] pci 0000:00:1d.0: Adding to iommu group 9
Jul 19 13:25:37 pve kernel: [ 0.567512] pci 0000:00:1f.0: Adding to iommu group 10
Jul 19 13:25:37 pve kernel: [ 0.567520] pci 0000:00:1f.3: Adding to iommu group 10
Jul 19 13:25:37 pve kernel: [ 0.567529] pci 0000:00:1f.4: Adding to iommu group 10
Jul 19 13:25:37 pve kernel: [ 0.567538] pci 0000:00:1f.5: Adding to iommu group 10
Jul 19 13:25:37 pve kernel: [ 0.567546] pci 0000:00:1f.6: Adding to iommu group 10
Jul 19 13:25:37 pve kernel: [ 0.567551] pci 0000:01:00.0: Adding to iommu group 1
Jul 19 13:25:37 pve kernel: [ 0.567556] pci 0000:01:00.1: Adding to iommu group 1
Jul 19 13:25:37 pve kernel: [ 0.567571] pci 0000:02:00.0: Adding to iommu group 11
Jul 19 13:25:37 pve kernel: [ 0.567583] pci 0000:03:00.0: Adding to iommu group 12
Jul 19 13:25:37 pve kernel: [ 0.567594] pci 0000:04:00.0: Adding to iommu group 13
The installer works as expected. What doesn't work is nvidia-smi, it will either display no device found, or throw some other error like this:
Code:
Jul 19 13:04:23 pve kernel: [ 5.414677] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.86.05 Fri Jul 14 20:20:58 UTC 2023
Jul 19 13:04:23 pve kernel: [ 5.431577] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jul 19 13:04:23 pve kernel: [ 5.431580] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
Jul 19 13:04:23 pve kernel: [ 5.463583] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
Jul 19 13:04:23 pve kernel: [ 5.468324] nvidia-uvm: Loaded the UVM driver, major device number 507.
Jul 19 13:04:23 pve kernel: [ 5.472258] Loading iSCSI transport class v2.0-870.
Jul 19 13:04:23 pve kernel: [ 5.475025] iscsi: registered transport (tcp)
Jul 19 13:04:23 pve kernel: [ 5.619771] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Fatal) error received: 0000:00:01.0
Jul 19 13:04:23 pve kernel: [ 5.667909] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
Jul 19 13:04:23 pve kernel: [ 5.667932] pcieport 0000:00:01.0: device [8086:1901] error status/mask=00004000/00000000
Jul 19 13:04:23 pve kernel: [ 5.667963] pcieport 0000:00:01.0: [14] CmpltTO (First)
Jul 19 13:04:23 pve kernel: [ 5.667977] nvidia 0000:01:00.0: AER: can't recover (no error_detected callback)
Jul 19 13:04:23 pve kernel: [ 5.667978] snd_hda_intel 0000:01:00.1: AER: can't recover (no error_detected callback)
Jul 19 13:04:23 pve kernel: [ 5.667985] NVRM: GPU at PCI:0000:01:00: GPU-a23264ce-e466-644b-6346-43552a404d5c
Jul 19 13:04:23 pve kernel: [ 5.667987] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Jul 19 13:04:23 pve kernel: [ 5.667989] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Jul 19 13:04:23 pve kernel: [ 5.668180] NVRM: A GPU crash dump has been created. If possible, please run
Jul 19 13:04:23 pve kernel: [ 5.668180] NVRM: nvidia-bug-report.sh as root to collect this data before
Jul 19 13:04:23 pve kernel: [ 5.668180] NVRM: the NVIDIA kernel module is unloaded.
Now, what does work is driver 470.199.02 and it works flawlessly, I am able to reboot, issue commands like nvidia-smi and run other applications. The issue is, the application I am trying to run wont run correctly with this older driver. It does work correctly with the latest driver which is v535.
I have tried to install multiple drivers > 470.199.02 and most of them do not work properly.
Last edited: