Something that is driving me a little crazy regarding NVidia Tesla cards and the virtual GPU passthrough that I am using on my Proxmox nodes is that one server shows the mdev types as that of a P40 card (GRID P40) and not a P4 (GRID P4). The other 4 nodes all show the correct ones for the P4. 'nvidia-smi' on both servers shows 'Tesla P4'.
I have tried verifying that the firmware on the cards is the same as a particular firmware file that I have, using nvflash. I have also swapped the Tesla P4 cards and the problem still exists on the problem server itself.
That server was the first one I used a Tesla P4 card on, and got working with the virtual GPU drivers etc. I have uninstalled and reinstalled the driver, using the same exact file as used on the other 4 servers.
I have even rsync'd /usr and /boot from a server that sees the card correctly over the one that is having issues. Still, the issue persists.
Has anyone else had this happen to them? If so, did you fix it and how?
I would rather not just hose the problem server and redo it but, I'm wondering if I need to. I wouldn't worry about it, but it makes it confusing to know what entry to select because the P40 has more memory, so if you get things wrong then different combinations of mdev entries will make things refuse to work properly.
Edit: On a previous search, I didn't see it but I'm reading this thread now: https://forum.proxmox.com/threads/vgpu-tesla-p4-wrong-mdevctl-gpu.143247/
The very strange thing is it is almost like a 'clean install' of no nvidia drivers on the system ever yielded the correctly showing P4 entries (i.e. the other 4 servers that I recently got some P4 cards in), only the original one I used to patch the drivers etc (to get them to work with kernel 6.8) and form the custom installer is persisting in being wrong. Weird.
I have tried verifying that the firmware on the cards is the same as a particular firmware file that I have, using nvflash. I have also swapped the Tesla P4 cards and the problem still exists on the problem server itself.
That server was the first one I used a Tesla P4 card on, and got working with the virtual GPU drivers etc. I have uninstalled and reinstalled the driver, using the same exact file as used on the other 4 servers.
I have even rsync'd /usr and /boot from a server that sees the card correctly over the one that is having issues. Still, the issue persists.
Has anyone else had this happen to them? If so, did you fix it and how?
I would rather not just hose the problem server and redo it but, I'm wondering if I need to. I wouldn't worry about it, but it makes it confusing to know what entry to select because the P40 has more memory, so if you get things wrong then different combinations of mdev entries will make things refuse to work properly.
Edit: On a previous search, I didn't see it but I'm reading this thread now: https://forum.proxmox.com/threads/vgpu-tesla-p4-wrong-mdevctl-gpu.143247/
The very strange thing is it is almost like a 'clean install' of no nvidia drivers on the system ever yielded the correctly showing P4 entries (i.e. the other 4 servers that I recently got some P4 cards in), only the original one I used to patch the drivers etc (to get them to work with kernel 6.8) and form the custom installer is persisting in being wrong. Weird.
Last edited: