Problem passing Nvidia Gtx670 GPU to MacOs or Windows

kokkorollo

New Member
Nov 9, 2022
2
0
1
Hi,
i'm having trouble passing an Nvidia GPU to vm.

on the same server I have an RTx2070 passed to WIndows 10 and is working fine, an AMD rx6600 passed to a MAc Os without any problem, but i'm stuggling passing the nvidia gtx670.
in windows the gpu is recognized but give the error43, and in mac os the board is recognized but not loaded.
Only one time, I don't know why, I was able to pass correctly the GPU on MAc OS but only with the virtual display to Vmware and only for few minutes, after that the mac os crashed. When this was woring the GPu was correctly recognized.
After that I have duplicated the Vm to make some other test and leave the vm as it was, but then on both the vms the gpu stopped working again.


this is the journalctl -f output for the Mac os with the working gpu
Nov 09 10:22:03 pcg pvedaemon[846833]: start VM 201: UPID:pcg:000CEBF1:005B3627:636B713B:qmstart:201:root@pam: Nov 09 10:22:03 pcg pvedaemon[821492]: <root@pam> starting task UPID:pcg:000CEBF1:005B3627:636B713B:qmstart:201:root@pam: Nov 09 10:22:03 pcg kernel: vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none Nov 09 10:22:03 pcg kernel: vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none Nov 09 10:22:03 pcg kernel: vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none Nov 09 10:22:03 pcg systemd[1]: Started 201.scope. Nov 09 10:22:03 pcg systemd-udevd[846836]: Using default interface naming scheme 'v247'. Nov 09 10:22:03 pcg systemd-udevd[846836]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Nov 09 10:22:03 pcg kernel: device tap201i0 entered promiscuous mode Nov 09 10:22:03 pcg kernel: vmbr0: port 3(tap201i0) entered blocking state Nov 09 10:22:03 pcg kernel: vmbr0: port 3(tap201i0) entered disabled state Nov 09 10:22:03 pcg kernel: vmbr0: port 3(tap201i0) entered blocking state Nov 09 10:22:03 pcg kernel: vmbr0: port 3(tap201i0) entered forwarding state Nov 09 10:22:04 pcg kernel: vfio-pci 0000:03:00.0: enabling device (0002 -> 0003) Nov 09 10:22:04 pcg kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x270 Nov 09 10:22:04 pcg kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0 Nov 09 10:22:04 pcg kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x26@0x410 Nov 09 10:22:04 pcg kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x27@0x440 Nov 09 10:22:04 pcg kernel: vfio-pci 0000:03:00.1: enabling device (0000 -> 0002) Nov 09 10:22:05 pcg pvedaemon[821492]: <root@pam> end task UPID:pcg:000CEBF1:005B3627:636B713B:qmstart:201:root@pam: OK

and
this is the journalctl -f output for the Mac os with the gtx670 gpu
Nov 09 10:19:26 pcg pvedaemon[821701]: <root@pam> starting task UPID:pcg:000CEA3B:005AF8F7:636B709E:qmstart:204:root@pam: Nov 09 10:19:26 pcg pvedaemon[846395]: start VM 204: UPID:pcg:000CEA3B:005AF8F7:636B709E:qmstart:204:root@pam: Nov 09 10:19:26 pcg systemd[1]: Started 204.scope. Nov 09 10:19:26 pcg systemd-udevd[846408]: Using default interface naming scheme 'v247'. Nov 09 10:19:26 pcg systemd-udevd[846408]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Nov 09 10:19:27 pcg kernel: device tap204i0 entered promiscuous mode Nov 09 10:19:27 pcg kernel: vmbr0: port 2(tap204i0) entered blocking state Nov 09 10:19:27 pcg kernel: vmbr0: port 2(tap204i0) entered disabled state Nov 09 10:19:27 pcg kernel: vmbr0: port 2(tap204i0) entered blocking state Nov 09 10:19:27 pcg kernel: vmbr0: port 2(tap204i0) entered forwarding state Nov 09 10:19:27 pcg kernel: vfio-pci 0000:08:00.0: vfio_ecap_init: hiding ecap 0x19@0x900 Nov 09 10:19:29 pcg pvedaemon[821701]: <root@pam> end task UPID:pcg:000CEA3B:005AF8F7:636B709E:qmstart:204:root@pam: OK

as far I can tell I can see that the section for the Gpu is different.
there is someone that can tell me what is happening?
thanks.
 
Hello,

In general, the 43 error is mentioned in our wiki guide about the Pci passthrough [0].

From my personal experience on the same card of AMD I had the same issue, but after I changed the slot of GPU card the issue is fixed.

[0] https://pve.proxmox.com/wiki/Pci_passthrough#BIOS_options
 
I've found that hte gtx 670 had a non UEFI firmware.
I've updated to an UEFI firmware and now the gpu is passed correctly and I'm able to see the outpu from the gpu on both windows and mac.
but after few seconds of work the vm stop to work. and I have an exclamation mark on the vm on the proxmox web interface.
on windows seems that the gpu stop working right after the login screen, instead on mac seems to be a random time, just few seconds or maximum a couple of minutes.

as soon the vm stop I get a continuous error from journalctl -f

Code:
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: can't find device of ID00e4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Multiple Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: can't find device of ID00e4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: can't find device of ID00e4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:   device [8086:43bc] error status/mask=00000001/00002000
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4:    [ 0] RxErr
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
Nov 09 12:37:02 pcg kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
 
In my experience, use only newer cards. I also tried an older one for days, swapped it with a newer one and the VM just worked without any fuzz.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!