It can be done using commodity GPUs and VFIO. Also, that article claims >80% of native speed, but I've seen >95% claimed for VFIO and KVM.
Thanks for the reply tho.
Subsequent to OP I've found there are small groups that have done this within the Arch Linux and Ubuntu communities, so...