Tesla P4 mistaken for P40?

dooferorg

Member
Apr 12, 2024
40
18
8
Something that is driving me a little crazy regarding NVidia Tesla cards and the virtual GPU passthrough that I am using on my Proxmox nodes is that one server shows the mdev types as that of a P40 card (GRID P40) and not a P4 (GRID P4). The other 4 nodes all show the correct ones for the P4. 'nvidia-smi' on both servers shows 'Tesla P4'.

I have tried verifying that the firmware on the cards is the same as a particular firmware file that I have, using nvflash. I have also swapped the Tesla P4 cards and the problem still exists on the problem server itself.

That server was the first one I used a Tesla P4 card on, and got working with the virtual GPU drivers etc. I have uninstalled and reinstalled the driver, using the same exact file as used on the other 4 servers.

I have even rsync'd /usr and /boot from a server that sees the card correctly over the one that is having issues. Still, the issue persists.


Has anyone else had this happen to them? If so, did you fix it and how?

I would rather not just hose the problem server and redo it but, I'm wondering if I need to. I wouldn't worry about it, but it makes it confusing to know what entry to select because the P40 has more memory, so if you get things wrong then different combinations of mdev entries will make things refuse to work properly.

Edit: On a previous search, I didn't see it but I'm reading this thread now: https://forum.proxmox.com/threads/vgpu-tesla-p4-wrong-mdevctl-gpu.143247/

The very strange thing is it is almost like a 'clean install' of no nvidia drivers on the system ever yielded the correctly showing P4 entries (i.e. the other 4 servers that I recently got some P4 cards in), only the original one I used to patch the drivers etc (to get them to work with kernel 6.8) and form the custom installer is persisting in being wrong. Weird.
 
Last edited:
Using the vgpu installer from here: https://wvthoog.nl/proxmox-vgpu-v3/

and doing 'upgrade' on the install (which removed the previously installed entries, and then re-installing but using the NVidia 535-161.05 via whatever download link the installer script had, actually worked. I noticed the installer downgraded my system's kernel to 6.5 though, which is a bit weird since I never asked it to do that.
 
What I ended up doing was uninstalling the nvidia drivers using the above mention installer script.

I then removed kernel 6.5, updated grub to get back on 6.8.

I then reinstalled my previously patched 6.8 compatible version of 535-161.05 and lo, I now have correct mdev types displayed.

So, I'll leave this here for posterity.. but that was weird. Maybe the uninstaller did a better job cleaning up?