Has anyone had luck actually installing the "proper" NVIDIA drivers on the host?
The biggest problem with luck is that it's not reliable.
Proxmox and Nvidia software are constantly evolving and even if you should be so lucky as to get things to work, any package update might possibly undo things and without mainline support for the combination your luck might run out.
I've fallen to the same temptation often enough, e.g. tried to force CUDA to run on OpenVZ/Virtuozzo and found myself rewriting kernel code to offset what I'd still label 'CUDA runtime stupidities'... I ran out of time before it got it to work, but at least I learned something.
Containers are designed to share
better, what was designed to be shared already.
Computers weren't designed to be shared initially, not even memory with data and code before John von Neumann's memo spread the idea, but in the case of storage, CPUs, memory, and even networks (which I'd argue have never been 'natural' in Unix) resources became much easier to share thanks to replacing (slow) human operators by (fast) software operating systems and design retrofits such as mapping (storage), resumable traps (CPU), memory mapping and virtualization (RAM), and 'better' networking.
But GPUs have changed their functionality and mission so rapidly over the last decades, that proper architecture, hardware abstractions, software and hardware co-design haven't quite caught up and I'm afraid they'll only ever will, once GPUs slow down their evolution and become boring.
And one such aspect IMHO can be seen in the fact that many want to virtualize the GPU for transcoding... a VPU functionality that, until streaming and remote gaming became a thing, wound up with the GPUs more or less by accident or marketing pushing ideas over thought.
VPU engines are fixed function blocks and evidently GPUs have them in varying numbers, relatively unrelated to the 'size' of a GPU. Sharing those should be pretty much like sharing a set of printers (limited time exclusive round-robin), and even if Microsoft still struggles with it, that hasn't been rocket science for a long time.
Except that on x86 VPUs are tied to the GPUs for said accident and lack of proper abstractions, both in hardware and in software.
But it's much worse with the proper GPU (graphics, not video), which just isn't designed to be shared: at best is has partition support.
So putting it under the exclusive control of a dedicated VM may not just be far easier now that at least PCIe pass-through mechanics have been devised and established, it's the better way of dealing with a resource not designed for sharing.
IMHO keeping it out of Proxmox' hair, is just a more efficient way of applying everyones' resources until xPUs, chipsets and operating systems have caught up.