Hi,
i am trying to get a new Dell Server with Sapphire Rapids CPU´s and a Nvidia A6000 to work in VGPU Mode
i have tried everything in regards to drivers or Proxmox Version 7.4. or 8.0 with newest or little bit older versions of drivers.
Once i am done installing the Host and enabled the sr-iov i could assign the newly created devices to the Proxmox VM.
I can start the VM just fine - once the VM is up ( i also tried newer Windows 2022 Server or Windows 11 ) it crashes once i install the newest Grid / KVM Driver on the VM. it does not give any error msg - the VM just reboots .
during loadup it looks like this on Version 8 with newest VGPU Drivers.
wondering if the iommu is the issue :
any ideas ?
i am trying to get a new Dell Server with Sapphire Rapids CPU´s and a Nvidia A6000 to work in VGPU Mode
i have tried everything in regards to drivers or Proxmox Version 7.4. or 8.0 with newest or little bit older versions of drivers.
Once i am done installing the Host and enabled the sr-iov i could assign the newly created devices to the Proxmox VM.
I can start the VM just fine - once the VM is up ( i also tried newer Windows 2022 Server or Windows 11 ) it crashes once i install the newest Grid / KVM Driver on the VM. it does not give any error msg - the VM just reboots .
during loadup it looks like this on Version 8 with newest VGPU Drivers.
Code:
Aug 01 10:56:14 ber1proxencoder09 systemd[1]: Created slice qemu.slice - Slice /qemu.
Aug 01 10:56:14 ber1proxencoder09 systemd[1]: Started 101.scope.
Aug 01 10:56:14 ber1proxencoder09 kernel: device tap101i0 entered promiscuous mode
Aug 01 10:56:14 ber1proxencoder09 kernel: vmbr0: port 2(tap101i0) entered blocking state
Aug 01 10:56:14 ber1proxencoder09 kernel: vmbr0: port 2(tap101i0) entered disabled state
Aug 01 10:56:14 ber1proxencoder09 kernel: vmbr0: port 2(tap101i0) entered blocking state
Aug 01 10:56:14 ber1proxencoder09 kernel: vmbr0: port 2(tap101i0) entered forwarding state
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 00000000-0000-0000-0000-000000000101 GPU PCI id 00:0a:00.4 config params vgpu_type_id=530
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=530
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_env_log: Successfully updated env symbols!
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): detected a VF at 0:a:0.4
Aug 01 10:56:20 ber1proxencoder09 kernel: NVRM: Software scheduler timeslice set to 2083uS.
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): gpu-pci-id : 0xa00
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): vgpu_type : Quadro
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): Framebuffer: 0x2cc000000
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): Virtual Device Id: 0x2230:0x1502
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: ######## vGPU Manager Information: ########
Aug 01 10:56:20 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: Driver Version: 535.54.06
Aug 01 10:56:21 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): vGPU supported range: (0x70001, 0x120001)
Aug 01 10:56:21 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): Init frame copy engine: syncing...
Aug 01 10:56:21 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): vGPU migration enabled
Aug 01 10:56:21 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: (0x0): vGPU manager is running in SRIOV mode.
Aug 01 10:56:21 ber1proxencoder09 nvidia-vgpu-mgr[3205]: notice: vmiop_log: display_init inst: 0 successful
Aug 01 10:56:21 ber1proxencoder09 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000101: vGPU migration enabled with upstream V2 migration protocol
wondering if the iommu is the issue :
Code:
[ 0.008403] ACPI: DMAR 0x0000000077501000 000438 (v01 DELL PE_SC3 00000001 INTL 20091013)
[ 0.008425] ACPI: Reserving DMAR table memory at [mem 0x77501000-0x77501437]
[ 0.316192] DMAR: IOMMU enabled
[ 0.640952] DMAR: Host address width 46
[ 0.640953] DMAR: DRHD base: 0x000000d97fc000 flags: 0x0
[ 0.640960] DMAR: dmar0: reg_base_addr d97fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.640963] DMAR: DRHD base: 0x000000e17fc000 flags: 0x0
[ 0.640966] DMAR: dmar1: reg_base_addr e17fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.640968] DMAR: DRHD base: 0x000000e97fc000 flags: 0x0
[ 0.640971] DMAR: dmar2: reg_base_addr e97fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.640973] DMAR: DRHD base: 0x000000f17fc000 flags: 0x0
[ 0.640977] DMAR: dmar3: reg_base_addr f17fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.640979] DMAR: DRHD base: 0x000000f97fc000 flags: 0x0
[ 0.640988] DMAR: dmar4: reg_base_addr f97fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.640989] DMAR: DRHD base: 0x000000d13fc000 flags: 0x0
[ 0.640993] DMAR: dmar5: reg_base_addr d13fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.640994] DMAR: DRHD base: 0x000000f9ffc000 flags: 0x0
[ 0.640998] DMAR: dmar6: reg_base_addr f9ffc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.640999] DMAR: DRHD base: 0x000000fa7fc000 flags: 0x0
[ 0.641003] DMAR: dmar7: reg_base_addr fa7fc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.641004] DMAR: DRHD base: 0x000000faffc000 flags: 0x0
[ 0.641007] DMAR: dmar8: reg_base_addr faffc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.641009] DMAR: DRHD base: 0x000000fb7fc000 flags: 0x0
[ 0.641012] DMAR: dmar9: reg_base_addr fb7fc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.641013] DMAR: DRHD base: 0x0000009fbfc000 flags: 0x0
[ 0.641017] DMAR: dmar10: reg_base_addr 9fbfc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.641018] DMAR: DRHD base: 0x000000a97fc000 flags: 0x0
[ 0.641022] DMAR: dmar11: reg_base_addr a97fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.641024] DMAR: DRHD base: 0x000000b33fc000 flags: 0x0
[ 0.641027] DMAR: dmar12: reg_base_addr b33fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.641028] DMAR: DRHD base: 0x000000bcffc000 flags: 0x0
[ 0.641031] DMAR: dmar13: reg_base_addr bcffc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.641032] DMAR: DRHD base: 0x000000c67fc000 flags: 0x0
[ 0.641035] DMAR: dmar14: reg_base_addr c67fc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.641037] DMAR: DRHD base: 0x000000c6ffc000 flags: 0x0
[ 0.641040] DMAR: dmar15: reg_base_addr c6ffc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.641041] DMAR: DRHD base: 0x000000c77fc000 flags: 0x0
[ 0.641044] DMAR: dmar16: reg_base_addr c77fc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.641045] DMAR: DRHD base: 0x000000c7ffc000 flags: 0x0
[ 0.641048] DMAR: dmar17: reg_base_addr c7ffc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.641050] DMAR: DRHD base: 0x000000c87fc000 flags: 0x0
[ 0.641053] DMAR: dmar18: reg_base_addr c87fc000 ver 6:0 cap 9ed008c40780466 ecap 3ef9ea6f050df
[ 0.641054] DMAR: DRHD base: 0x00000095ffc000 flags: 0x1
[ 0.641057] DMAR: dmar19: reg_base_addr 95ffc000 ver 6:0 cap 9ed008c40780466 ecap 3ee9e86f050df
[ 0.641059] DMAR: RMRR base: 0x0000006e675000 end: 0x0000006e677fff
[ 0.641061] DMAR: ATSR flags: 0x0
[ 0.641062] DMAR: RHSA base: 0x00000095ffc000 proximity domain: 0x0
[ 0.641063] DMAR: RHSA base: 0x0000009fbfc000 proximity domain: 0x0
[ 0.641064] DMAR: RHSA base: 0x000000a97fc000 proximity domain: 0x0
[ 0.641065] DMAR: RHSA base: 0x000000b33fc000 proximity domain: 0x0
[ 0.641065] DMAR: RHSA base: 0x000000bcffc000 proximity domain: 0x0
[ 0.641066] DMAR: RHSA base: 0x000000c67fc000 proximity domain: 0x0
[ 0.641067] DMAR: RHSA base: 0x000000c6ffc000 proximity domain: 0x0
[ 0.641067] DMAR: RHSA base: 0x000000c77fc000 proximity domain: 0x0
[ 0.641068] DMAR: RHSA base: 0x000000c7ffc000 proximity domain: 0x0
[ 0.641069] DMAR: RHSA base: 0x000000c87fc000 proximity domain: 0x0
[ 0.641070] DMAR: RHSA base: 0x000000d97fc000 proximity domain: 0x1
[ 0.641070] DMAR: RHSA base: 0x000000e17fc000 proximity domain: 0x1
[ 0.641071] DMAR: RHSA base: 0x000000e97fc000 proximity domain: 0x1
[ 0.641072] DMAR: RHSA base: 0x000000f17fc000 proximity domain: 0x1
[ 0.641072] DMAR: RHSA base: 0x000000f97fc000 proximity domain: 0x1
[ 0.641073] DMAR: RHSA base: 0x000000d13fc000 proximity domain: 0x1
[ 0.641074] DMAR: RHSA base: 0x000000f9ffc000 proximity domain: 0x1
[ 0.641075] DMAR: RHSA base: 0x000000fa7fc000 proximity domain: 0x1
[ 0.641075] DMAR: RHSA base: 0x000000faffc000 proximity domain: 0x1
[ 0.641076] DMAR: RHSA base: 0x000000fb7fc000 proximity domain: 0x1
[ 0.641077] DMAR: SATC flags: 0x0
[ 0.641079] DMAR-IR: IOAPIC id 8 under DRHD base 0x95ffc000 IOMMU 19
[ 0.641081] DMAR-IR: HPET id 0 under DRHD base 0x95ffc000
[ 0.641082] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.647743] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 14.112934] DMAR: IOMMU feature pasid inconsistent
[ 14.112936] DMAR: IOMMU feature prs inconsistent
[ 14.112937] DMAR: IOMMU feature pasid inconsistent
[ 14.112938] DMAR: IOMMU feature prs inconsistent
[ 14.112939] DMAR: IOMMU feature pasid inconsistent
[ 14.112940] DMAR: IOMMU feature prs inconsistent
[ 14.112940] DMAR: IOMMU feature pasid inconsistent
[ 14.112941] DMAR: IOMMU feature prs inconsistent
[ 14.112942] DMAR: IOMMU feature pasid inconsistent
[ 14.112942] DMAR: IOMMU feature prs inconsistent
[ 14.112944] DMAR: IOMMU feature pasid inconsistent
[ 14.112944] DMAR: IOMMU feature prs inconsistent
[ 14.112945] DMAR: IOMMU feature pasid inconsistent
[ 14.112946] DMAR: IOMMU feature prs inconsistent
[ 14.112947] DMAR: IOMMU feature pasid inconsistent
[ 14.112947] DMAR: IOMMU feature prs inconsistent
[ 14.112948] DMAR: IOMMU feature pasid inconsistent
[ 14.112948] DMAR: IOMMU feature prs inconsistent
[ 14.112949] DMAR: IOMMU feature pasid inconsistent
[ 14.112950] DMAR: IOMMU feature prs inconsistent
[ 14.112951] DMAR: IOMMU feature pasid inconsistent
[ 14.112951] DMAR: IOMMU feature prs inconsistent
[ 14.112952] DMAR: IOMMU feature pasid inconsistent
[ 14.112952] DMAR: IOMMU feature prs inconsistent
[ 14.112953] DMAR: dmar18: Using Queued invalidation
[ 14.112961] DMAR: Translation was enabled for dmar18 but we are not in kdump mode
[ 14.112963] DMAR: dmar17: Using Queued invalidation
[ 14.112965] DMAR: Translation was enabled for dmar17 but we are not in kdump mode
[ 14.112966] DMAR: dmar16: Using Queued invalidation
[ 14.112968] DMAR: Translation was enabled for dmar16 but we are not in kdump mode
[ 14.112969] DMAR: dmar15: Using Queued invalidation
[ 14.112971] DMAR: Translation was enabled for dmar15 but we are not in kdump mode
[ 14.112972] DMAR: dmar14: Using Queued invalidation
[ 14.112981] DMAR: Translation was enabled for dmar14 but we are not in kdump mode
[ 14.112982] DMAR: dmar13: Using Queued invalidation
[ 14.112984] DMAR: Translation was enabled for dmar13 but we are not in kdump mode
[ 14.112985] DMAR: dmar12: Using Queued invalidation
[ 14.112987] DMAR: Translation was enabled for dmar12 but we are not in kdump mode
[ 14.112988] DMAR: dmar11: Using Queued invalidation
[ 14.112990] DMAR: Translation was enabled for dmar11 but we are not in kdump mode
[ 14.112991] DMAR: dmar10: Using Queued invalidation
[ 14.112998] DMAR: Translation was enabled for dmar10 but we are not in kdump mode
[ 14.112999] DMAR: dmar9: Using Queued invalidation
[ 14.113001] DMAR: Translation was enabled for dmar9 but we are not in kdump mode
[ 14.113003] DMAR: dmar8: Using Queued invalidation
[ 14.113005] DMAR: Translation was enabled for dmar8 but we are not in kdump mode
[ 14.113006] DMAR: dmar7: Using Queued invalidation
[ 14.113009] DMAR: Translation was enabled for dmar7 but we are not in kdump mode
[ 14.113010] DMAR: dmar6: Using Queued invalidation
[ 14.113018] DMAR: Translation was enabled for dmar6 but we are not in kdump mode
[ 14.113019] DMAR: dmar5: Using Queued invalidation
[ 14.113022] DMAR: Translation was enabled for dmar5 but we are not in kdump mode
[ 14.113023] DMAR: dmar4: Using Queued invalidation
[ 14.113025] DMAR: Translation was enabled for dmar4 but we are not in kdump mode
[ 14.113026] DMAR: dmar3: Using Queued invalidation
[ 14.113029] DMAR: Translation was enabled for dmar3 but we are not in kdump mode
[ 14.113030] DMAR: dmar2: Using Queued invalidation
[ 14.113036] DMAR: Translation was enabled for dmar2 but we are not in kdump mode
[ 14.113038] DMAR: dmar1: Using Queued invalidation
[ 14.113040] DMAR: Translation was enabled for dmar1 but we are not in kdump mode
[ 14.113041] DMAR: dmar0: Using Queued invalidation
[ 14.113043] DMAR: Translation was enabled for dmar0 but we are not in kdump mode
[ 14.113046] DMAR: dmar19: Using Queued invalidation
[ 14.113049] DMAR: Translation was enabled for dmar19 but we are not in kdump mode
[ 14.131672] DMAR: Intel(R) Virtualization Technology for Directed I/O
any ideas ?