To answer your initial question directly: in current versions of QEMU/Proxmox, emulating an NVMe device is unlikely to deliver a dramatic performance improvement.
The reason lies in understanding what "virtio" actually is. Virtio is a paravirtualized I/O framework that enables efficient communication between the guest and QEMU via shared memory queues, called virtqueues. In many ways, it behaves similarly to an NVMe queue pair, providing an optimized path for I/O requests and completions.
The virtio-scsi controller exposes a SCSI-compatible interface on top of virtio queues when handling storage requests and completions. Because it presents itself as a SCSI device to the guest OS, there is additional overhead for managing SCSI command buffers, responses, and asynchronous event notifications.
This overhead can slightly impact latency and CPU efficiency inside the guest.
If your goal is to minimize overhead for performance-critical virtual machines, virtio-blk can sometimes be a better choice. Like virtio-scsi, it uses virtio message queues but avoids the SCSI protocol layer, resulting in lower latency and slightly higher throughput in some workloads. That said, the latency differences are modest, typically just a few microseconds.
As of QEMU 9, virtio-blk supports multiple queues, leveling the playing field with virtio-scsi. We generally recommend virtio-scsi unless your workload has unusual requirements, mostly because we like the well-defined semantics that SCSI provides.
As Bob pointed out, NVMe emulation in QEMU is primarily intended for development and testing. Even if fully optimized, it is unlikely to outperform virtio-blk significantly, since the backend work, shared memory message passing, is nearly identical. Some minor gains could come from bypassing the traditional Linux block layer, but multiplexing host queue pairs introduces additional overhead in QEMU.
The main potential advantage of NVMe emulation is supporting one I/O thread per host queue pair. However, this will increase CPU utilization, which inevitably draws complaints.
I hope this helps!
Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox