Hi,
I wonder if someone can help me with troubleshooting steps for the VM with GPU that won't start. I've got a server with 8 NVIDIA Tesla M40 GPU cards. I had GPU configured and working fine for over a year. Recently I had a request to create a new VM with GPU, couldn't get it to start. After the reboot, the VM started OK, and it's been working ok since. However, now I have another request for a GPU VM, and I can't keep restarting the server, hoping it will sort out my issue. There is not much in the logs other than a timeout message when I try to start VM:
Any help is much appreciated.
Thanks
I wonder if someone can help me with troubleshooting steps for the VM with GPU that won't start. I've got a server with 8 NVIDIA Tesla M40 GPU cards. I had GPU configured and working fine for over a year. Recently I had a request to create a new VM with GPU, couldn't get it to start. After the reboot, the VM started OK, and it's been working ok since. However, now I have another request for a GPU VM, and I can't keep restarting the server, hoping it will sort out my issue. There is not much in the logs other than a timeout message when I try to start VM:
Code:
Mar 03 10:37:48 proxgpu pvestatd[9079]: VM 145 qmp command failed - VM 145 qmp command 'query-proxmox-support' failed - unable to connect to VM 145 qmp socket - timeout after 31 retries
Mar 03 10:37:48 proxgpu pvestatd[9079]: status update time (6.316 seconds)
Mar 03 10:37:57 proxgpu pvedaemon[2024118]: start failed: command '/usr/bin/kvm -id 145 -name ub2204-gpu-template -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/145.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/145.pid -daemonize -smbios 'type=1,uuid=039fc4b6-a44c-4e47-a699-5b56296b7343' -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/145.vnc,password -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 8192 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=525ecadd-5fd9-4653-a214-06ed788dc92c' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:83:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:26c85950ee8b' -drive 'file=/var/lib/vz/template/iso/ubuntu-22.04-live-server-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/localdata-zfs/vm-145-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap145i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=A2:1F:8C:47:53:EA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -machine 'type=q35+pve0'' failed: got timeout
Mar 03 10:37:57 proxgpu pvedaemon[2776800]: <root@pam> end task UPID:proxgpu:001EE2B6:0CBA7DBC:6401CDE6:qmstart:145:root@pam: start failed: command '/usr/bin/kvm -id 145 -name ub2204-gpu-template -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/145.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/145.pid -daemonize -smbios 'type=1,uuid=039fc4b6-a44c-4e47-a699-5b56296b7343' -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/145.vnc,password -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 8192 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=525ecadd-5fd9-4653-a214-06ed788dc92c' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:83:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:26c85950ee8b' -drive 'file=/var/lib/vz/template/iso/ubuntu-22.04-live-server-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/localdata-zfs/vm-145-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap145i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=A2:1F:8C:47:53:EA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -machine 'type=q35+pve0'' failed: got timeout
Mar 03 10:37:58 proxgpu pvestatd[9079]: VM 145 qmp command failed - VM 145 qmp command 'query-proxmox-support' failed - unable to connect to VM 145 qmp socket - timeout after 31 retries
Any help is much appreciated.
Thanks