VM with GPU passthrough slower when rendering

XiaoQi

New Member
Feb 6, 2024
1
0
1
Hello, I have render farm which using proxmox as a server and installed VMs for render manager and the render worker in it. The render farm has RTX A5000 GPU that's already passhtrough and assigned to render worker vm. But, when I test render compared to render PC (not a vm) with GPU RTX 3070 with the same blender file the render time different is too far, render pc finished 45 minutes but the VM worker takes 1 hour 23 minutes. That's makes me wondering where's the bottleneck? The CPU or GPU or other konfiguration in proxmox.

GPU passthrough reference I used:
https://pve.proxmox.com/wiki/PCI(e)_Passthrough
https://forum.proxmox.com/threads/p...x-ve-8-installation-and-configuration.130218/


Here are some information about proxmox server and the vm:

One server with 2 vm render worker
Lenovo Thinksystem SRV665 V3
  • CPU: 2x AMD EPYC 9254 24 cores
  • RAM: 128 GB DDR5
  • GPU: 2x RTX A5000
  • Disk: SSD 3840GB
PC-Render specifications:
  • OS : Windows
  • CPU : Intel core i7-11700 @ 2.50GHz
  • RAM : Kingston DDR4 4 x 8GB 1600MHz
  • GPU : NVIDIA GeForce RTX 3070 8GB
  • Storage : - SSD Samsung 500GB
    - HDD Seagate 2TB

Code:
pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.0-1
proxmox-backup-file-restore: 3.2.0-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.1
pve-cluster: 8.0.6
pve-container: 5.0.10
pve-docs: 8.2.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.5
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2

Code:
qm config 110
balloon: 0
bios: ovmf
boot: order=scsi0;ide0;net0
cores: 32
cpu: x86-64-v2-AES
efidisk0: NAS-CD02:110/vm-110-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:81:00,pcie=1,x-vga=1
machine: pc-q35-8.1
memory: 40960
meta: creation-qemu=8.1.5,ctime=1716871656
name: SRV-RENDER-WORKER02
net0: virtio=BC:24:11:5D:AD:32,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: NAS-CD02:110/vm-110-disk-1.qcow2,iothread=1,size=1000G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=55bf8114-c1c6-4dff-b54f-deb347c4700a
sockets: 1
tags: renderfarm;win
vga: std
vmgenid: aea02d35-9c85-4f21-827f-1244bd99a88d

VM worker specs: (both of them are same)
  • OS : Windows
  • CPU : 32 cores
  • RAM : 40 GB
  • GPU : 2 x NVIDIA RTX A5000
  • Storage : 1 TB

Code:
 lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          52 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   96
  On-line CPU(s) list:    0-95
Vendor ID:                AuthenticAMD
  BIOS Vendor ID:         Advanced Micro Devices, Inc.
  Model name:             AMD EPYC 9254 24-Core Processor
    BIOS Model name:      AMD EPYC 9254 24-Core Processor                 Unknown CPU @ 2.9GHz
    BIOS CPU family:      107
    CPU family:           25
    Model:                17
    Thread(s) per core:   2
    Core(s) per socket:   24
    Socket(s):            2
    Stepping:             1
    Frequency boost:      enabled
    CPU(s) scaling MHz:   63%
    CPU max MHz:          4151.7568
    CPU min MHz:          1500.0000
    BogoMIPS:             5791.70
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 n
                          opl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extap
                          ic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 i
                          brs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512
                          vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_l
                          ock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni v
                          aes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d debug_swap
Virtualization features:
  Virtualization:         AMD-V
Caches (sum of all):
  L1d:                    1.5 MiB (48 instances)
  L1i:                    1.5 MiB (48 instances)
  L2:                     48 MiB (48 instances)
  L3:                     256 MiB (8 instances)
NUMA:
  NUMA node(s):           2
  NUMA node0 CPU(s):      0-23,48-71
  NUMA node1 CPU(s):      24-47,72-95
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Mitigation; Safe RET
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

GRUB config:
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt nomodeset"
GRUB_CMDLINE_LINUX=""

dmesg | grep -e IOMMU
Code:
dmesg | grep -e IOMMU
[    1.332256] pci 0000:60:00.2: AMD-Vi: IOMMU performance counters supported
[    1.340979] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    1.348393] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    1.359385] pci 0000:20:00.2: AMD-Vi: IOMMU performance counters supported
[    1.369606] pci 0000:e0:00.2: AMD-Vi: IOMMU performance counters supported
[    1.377982] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[    1.387868] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[    1.395506] pci 0000:a0:00.2: AMD-Vi: IOMMU performance counters supported
[    1.406557] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    1.406572] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[    1.406586] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[    1.406600] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
[    1.406614] perf/amd_iommu: Detected AMD IOMMU #4 (2 banks, 4 counters/bank).
[    1.406628] perf/amd_iommu: Detected AMD IOMMU #5 (2 banks, 4 counters/bank).
[    1.406647] perf/amd_iommu: Detected AMD IOMMU #6 (2 banks, 4 counters/bank).
[    1.406661] perf/amd_iommu: Detected AMD IOMMU #7 (2 banks, 4 counters/bank).

/etc/modules
Code:
vfio
vfio_iommu_type1
vfop_pci

And I found that's the vm worker GPU r/w throughput speed seems not normal compared to pc-render. Then the novabench vm worker result is lower than pc-render, is vm worker storage r/w speed normal? I think it's too small.

I'm new to proxmox, I'm just following all the tutorials, and I have no idea why this issue happen. Thanks for the attention.
 

Attachments

  • Novabench PC-Render 2024-07-25 150508.png
    Novabench PC-Render 2024-07-25 150508.png
    108.1 KB · Views: 6
  • Novabench Render-Worker02_result 2024-07-25 105918.png
    Novabench Render-Worker02_result 2024-07-25 105918.png
    86.7 KB · Views: 5
  • PC-Render Throughput.jpeg
    PC-Render Throughput.jpeg
    464.3 KB · Views: 6
  • Render-Worker02 Throughput.jpeg
    Render-Worker02 Throughput.jpeg
    679.4 KB · Views: 5

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!