[SOLVED] Running commands on guest using hookscript fails (Error msg: QEMU guest agent is not running)

ahriman

Member
Apr 26, 2022
25
4
8
I'm having trouble using the hookscript to run commands on the guest. The error I'm getting is QEMU guest agent is not running. However, I know this is wrong, since I can run the same commands after the VM is booted just fine. I thought it might be a problem with the QEMU agent needing extra time to startup, so I added a delay of 10 seconds and verified that that works OK, but that didn't solve the issue either.

Perl:
elsif ($phase eq 'post-start') {

    # Second phase 'post-start' will be executed after the guest
    # successfully started.

    print scalar localtime();
    print "\n";
    sleep(10);
    print scalar localtime();
    print "\n";
    system("qm guest exec $vmid -- mkdir ./test");
    print "$vmid started successfully.\n";

}

And here's the output from the VM's startup log:

Code:
generating cloud-init ISO
GUEST HOOK: 802 pre-start
802 is starting, doing preparations.
iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
GUEST HOOK: 802 post-start
Tue Sep 27 13:56:31 2022
Tue Sep 27 13:56:41 2022
QEMU guest agent is not running
802 started successfully.
TASK OK

And just so no one tells me to double check that the guest agent is running (it is), terminal output from running a guest exec command at the command line:

Code:
qm guest exec 802 -- echo Worked!
{
   "exitcode" : 0,
   "exited" : 1,
   "out-data" : "Worked!\n"
}

Anyone have any ideas what I'm doing wrong?
 
Have you tried to increase sleep? Try something that really waits - 60-120. If that works, reconsider using sleep as way to decide that system is fully booted. Its a guaranteed occasional, or in your case constant, race condition where something might have taken a millisecond longer and just missed your 10s cut off.
A better approach would be something like:
for count=0;count < 10;count++);do
if qm agent ping $vm;then
break
else
echo agent not responding on retry $count
sleep 5
false
fi || echo agent never responded
done

obviously this is pseudo code.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: ahriman
Have you tried to increase sleep? Try something that really waits - 60-120. If that works, reconsider using sleep as way to decide that system is fully booted. Its a guaranteed occasional, or in your case constant, race condition where something might have taken a millisecond longer and just missed your 10s cut off.
A better approach would be something like:
for count=0;count < 10;count++);do
if qm agent ping $vm;then
break
else
echo agent not responding on retry $count
sleep 5
false
fi || echo agent never responded
done

obviously this is pseudo code.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Increased sleep to 30 seconds, and that worked. Thank you!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!