After VGPU is allocated to PVE+A16 virtual machine, it will run for a period of time, but it will not start when the virtual machine is restarted.

yiyang5188

New Member
May 19, 2025
6
0
1
I'm PVE+A16, with driver version 18.0, and the installation document is https://pve.proxmox.com/wiki/NVIDIA _ vgpu _ on _ proxmox _ ve # cite _ note-driver-versions-10. I allocate 2G video memory to each virtual machine, and all virtual machines can start normally at first, but after a few days. Restarting the virtual machine has been unable to open the machine, prompting [NVIDIA-VGPU-VFIO] no VGPU device found for VF0000: 8B: 01.7.[228095.276939] [nvidia-vgpu-vfio] current_vgpu_type of VF not configured
1747630698992.png
1747630725941.png
1747630746332.png
Please help me to see what is going on.
 
hi,

can post the complete task log output from a failed start and maybe from a restart too?

also the output of e.g.

Code:
lspci
nvidia-smi vgpu 
nvidia-smi vgpu -s
nvidia-smi vgpu -c

would be helpful
 
hi,

can post the complete task log output from a failed start and maybe from a restart too?

also the output of e.g.

Code:
lspci
nvidia-smi vgpu
nvidia-smi vgpu -s
nvidia-smi vgpu -c

would be helpful
1747701754234.png
1747701778991.png
1747701809465.png

1747701826375.png
1747701850025.png
Restarting the virtual machine every few hours will always be stuck, but if you restart the server, it is ok to restart the virtual machine after the restart, which is really strange.
 
first, please in the future, post text as text and not as images (it becomes much harder to see, especially since the images are so large ;) )

also can you post your resource mappings + vm configs and the full output of `lspci -nnk` (as text please)

as far as i can see it tries to use the 8a:00 card, but can not find a free slot there (which makes sense since that card is fully used with 2 8Q profiles), it then seemingly has trouble using the 8b:.00 card, but i need more info to see what's going on there

also I'd need the full task log of such a failed start.


EDIT:

i just noticed that you use the v17 of the vgpu drivers, which is not really supported on Proxmox VE, just fyi ;) (the driver 550.144.02 of the nvidia-smi output is for v17, v18 would be 570)
 
Last edited:
first, please in the future, post text as text and not as images (it becomes much harder to see, especially since the images are so large ;) )

also can you post your resource mappings + vm configs and the full output of `lspci -nnk` (as text please)

as far as i can see it tries to use the 8a:00 card, but can not find a free slot there (which makes sense since that card is fully used with 2 8Q profiles), it then seemingly has trouble using the 8b:.00 card, but i need more info to see what's going on there

also I'd need the full task log of such a failed start.


EDIT:

i just noticed that you use the v17 of the vgpu drivers, which is not really supported on Proxmox VE, just fyi ;) (the driver 550.144.02 of the nvidia-smi output is for v17, v18 would be 570)
The same problem occurred with the version V18 I used before. Yesterday, I changed to the version V17, and the problem still exists. It should be the 8b that I used directly. I also had that report when I started it normally. I should have used up the memory of 8a, so I got this prompt. The memory of 8b is still available, and I don't know why it can't be used.
 

Attachments

  • nvidia.txt
    nvidia.txt
    76.3 KB · Views: 3
  • 1b58b5c568ae9f72a6bcfaa6f70c9c06.png
    1b58b5c568ae9f72a6bcfaa6f70c9c06.png
    241.2 KB · Views: 3
  • e9c16b6bb7dda1dd81c2968f268daccb.png
    e9c16b6bb7dda1dd81c2968f268daccb.png
    140.4 KB · Views: 3
can you post the full vm config and resource config in text format?
Code:
qm config ID
cat /etc/pve/mapping/pci.cfg

also I need the full output of a failed task log please

maybe also a journal output since boot, otherwise it's hard to diagnose what actually happens
 
您能以文本格式发布完整的虚拟机配置和资源配置吗?
[代码]
qm 配置 ID
猫/etc/pve/mapping/pci.cfg
[/代码]

另外我需要失败任务日志的完整输出

也许还有启动以来的日志输出,否则很难诊断实际发生了什么
 

Attachments

can you post the full vm config and resource config in text format?
Code:
qm config ID
cat /etc/pve/mapping/pci.cfg

also I need the full output of a failed task log please

maybe also a journal output since boot, otherwise it's hard to diagnose what actually happens
I found that only four virtual machines with graphics cards can be turned on, and the fifth one has been stuck. Turn off one of the started virtual machines, and the one that can't be turned on can be turned on. Why can only four virtual machines with graphics cards be turned on? Aren't all the memory used up?
 

Attachments

  • 4444.png
    4444.png
    39.2 KB · Views: 2
您能以文本格式发布完整的虚拟机配置和资源配置吗?
[代码]
qm 配置 ID
猫/etc/pve/mapping/pci.cfg
[/代码]

另外我需要失败任务日志的完整输出

也许还有启动以来的日志输出,否则很难诊断实际发生了什么
你能帮我看看如何解决这个问题吗?
 
please write your posts in english, otherwise not many here will be able to help you.

i'm still waiting for the full task log output of a failed start.
Also the full journal output during such a failed start would be good

also I need the full output of a failed task log please

maybe also a journal output since boot, otherwise it's hard to diagnose what actually happens