can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout

Akubra

Member
Jul 30, 2023
10
1
8
I'm using Veeam 12.3 to backup VMs on Proxmox, to do that Veeam is using a worker VM. Every couple of days backup fails as the worker can't be found:
Failed to prepare the worker <WORKERNAME>: Failed to synchronize configuration settings of the worker <WORKERNAME> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout;

If I look at Proxmox, I can see the worker VM still has a status of starting. Only way to fix that is run 'systemctl restart pvedaemon' from the Proxmox console.
Once this process is killed I can go Veeam and run initiate a test for the worker and that completes successfully and for the next couple of days backups work fine.

Reached out to Veeam but they think it is a Proxmox issue.
Have also created a new Veeam worker VM but issue does come back.

Tried:
lsof /var/lock/qemu-server/lock-104.conf
ps -f -p <PID>

but that did not give me a lot of info:
task UPID:<PROXMOXSERVER>:003A0684:16B

Not sure what next steps to take.
 
hi,

is there still a running task for that vm?

if the lock can't be aquired, then it's most likely conflicting with another action that's currently running (such as starting, backup, etc.)

Reached out to Veeam but they think it is a Proxmox issue.
if they think it's a proxmox ve issue, they should reach out (e.g. on our developer list/bug tracker/etc)
they have a much easier way to debug proxmox ve (since it's open source) than we have to debug veeam...
 
Thanks Dominik, this worker VM is not running 24/7. As soon as Veeam starts the backup job, it starts up the worker VM, does the backup job and shuts down the VM again. This works for a couple of days (1 backup job per day) but then it throws the error.
Regardless of Veeam responding, how can I check what else might be locking the file, preventing a VM from starting.

As mentioned, I tried (which as recommended by Veeam):
lsof /var/lock/qemu-server/lock-104.conf
which returns a <PID>

and then run:
ps -f -p <PID from previous CMD>

but that did not give me a lot of info:
task UPID:<PROXMOXSERVER>:003A0684:16B

Maybe it does but I'm not interpreting it correctly.
 
task UPID:<PROXMOXSERVER>:003A0684:16B
this corresponds to a worker task, and if it still is running, this should be visible on the ui (on the bottom) as a running task
if it's not running anymore, it should still be there, you can get e.g. the status from the 'upid' with

Code:
pvenode task status <upid>

or show all running tasks on the commandline with:

Code:
pvenode task list --source active
 
  • Like
Reactions: Akubra
I'm using Veeam 12.3 to backup VMs on Proxmox, to do that Veeam is using a worker VM. Every couple of days backup fails as the worker can't be found:
Failed to prepare the worker <WORKERNAME>: Failed to synchronize configuration settings of the worker <WORKERNAME> can't lock file '/var/lock/qemu-server/lock-104.conf' - got timeout;

If I look at Proxmox, I can see the worker VM still has a status of starting. Only way to fix that is run 'systemctl restart pvedaemon' from the Proxmox console.
Once this process is killed I can go Veeam and run initiate a test for the worker and that completes successfully and for the next couple of days backups work fine.

I have exact the same issue.

Code:
systemctl restart pvedaemon

Helped that the worker-VM leave the "try to start" problem, after the service restart Veeam is abele to start the VM. But not to configure.
I still have to restart the PVE host, then it works again. for 1 to 3 the days.

Were you able to solve the problem?

Regards,
 
I have exact the same issue.

Code:
systemctl restart pvedaemon

Helped that the worker-VM leave the "try to start" problem, after the service restart Veeam is abele to start the VM. But not to configure.
I still have to restart the PVE host, then it works again. for 1 to 3 the days.

Were you able to solve the problem?

Regards,
Only way this fixed it for me was to have this worker VM run 24/7 instead of start/stop being triggered by the backup job.
So far so good
 
im trying to use cron job.
0 18 * * * systemctl restart pvedaemon

runs it every day at 1800 which is about 2 hours before backups are set to start.

dunno what are the ramifications of it
 
Hello,

I have the same problem.

Can you share how you managed to keep the worker on permanently?

As a workaround I do not let Veeam control the starting and stopping of the Proxmox worker VM.
Instead I leave the VM running all times.


  1. On the Veeam Backup & Replication (VBR) server, open a command prompt as admin
  2. Open the Proxmox PVE plug-in configuration file:
    notepad 'C:\Program Files\Veeam\Plugins\PVE\Service\appsettings.json'
  3. Find the Workers section
  4. Set the KeepTurnedOn value to true
  5. Save and close the file
  6. Close the VBR console
  7. Open services.msc and restart the Veeam PVE Service
  8. Re-open the VBR console and test the backup behaviour
 
  • Like
Reactions: fdylan
im trying to use cron job.
0 18 * * * systemctl restart pvedaemon

runs it every day at 1800 which is about 2 hours before backups are set to start.

dunno what are the ramifications of it
running the cron jobs worked for me. i have made it so they do not run at the same time on each node though. the proxies start and stop on their own as needed by veeam. i did not need to keep them running, and i do not have to keep checking if they are still running, as they behave as intended by veeam, and torn on, run the backups and turn off
 
As a workaround I do not let Veeam control the starting and stopping of the Proxmox worker VM.
Instead I leave the VM running all times.


  1. On the Veeam Backup & Replication (VBR) server, open a command prompt as admin
  2. Open the Proxmox PVE plug-in configuration file:
    notepad 'C:\Program Files\Veeam\Plugins\PVE\Service\appsettings.json'
  3. Find the Workers section
  4. Set the KeepTurnedOn value to true
  5. Save and close the file
  6. Close the VBR console
  7. Open services.msc and restart the Veeam PVE Service
  8. Re-open the VBR console and test the backup behaviour
Thanks for these steps, running into this same issue as everyone here too, seems to get stuck at `generating cloud-init ISO` when Veeam tries to start it.
 
Hi,

This issue happens when the Veeam worker VM gets stuck in the "starting" state, and Proxmox locks the VM config file. That’s why backups fail.

Since restarting pvedaemon fixes it, it might be a stuck task or process. You can try:
  • Run journalctl -u pvedaemon -n 100 to check logs.
  • Check if the VM is stuck with ps aux | grep qemu | grep 104.
  • If safe, unlock the VM with qm unlock 104.
 
another way to get this error, if anyone is looking, is that the vm zvol doesnt exist. I copied from another PVE box and typo'd... takes 300s to give up though, oy. mean while you are freaking out wondering wtf happened... :/ finally you get a proper error

timeout: no zvol device link for 'vm-150-disk-0' found after 300 sec.

 
Hey everyone, I wanted to share a solution I've been running for a few weeks that solved a persistent headache with Veeam worker VMs on Proxmox.


My setup is Veeam Backup & Replication 13 (build 13.01.1071) running against Proxmox VE 9.1.4 with daily scheduled backup jobs. Veeam previously deploys worker VMs on the Proxmox host to handle backup processing and shuts them down when each job finishes. The problem is they don't always come back up cleanly for the next job. Sometimes, a stale lock file gets left behind, or a start task gets stuck in pvedaemon, and the workers just sit offline until someone manually intervenes.

Note: I have deployed two Veeam proxmox workers as I like to have a little redundancy.


The error you'll see in the Proxmox task log is either:

Error: timeout waiting on systemd​

or​

can't lock file '/var/lock/qemu-server/lock-104.conf'​

What's actually happening is Proxmox starts the QEMU process fine, but then waits for the guest agent inside the VM to respond and times out. The stuck task holds a lock, every subsequent start attempt fails for the same reason, and without manual intervention the workers stay offline indefinitely.


My fix was a simple bash watchdog script running as a cron job every hour. It checks if the worker VMs are running, clears any stale locks, kills stuck start tasks, and brings the VMs back up automatically. I've been running it for a couple of weeks now and haven't had to SSH in manually once. Sharing it here in case it saves another admin the same frustration.


The Solution​


A watchdog script that runs every hour via cron. It checks whether each worker VM is running, handles stale locks, kills zombie qmstart tasks, and starts the VM if needed. It also uses a startup timeout so it can never hang indefinitely like the default qm start behavior.

Setup​


Step 1: Create the script

Code:
nano /usr/local/bin/vm-watchdog.sh


Paste the following update the start_vm lines at the bottom to match your VM IDs:

Code:
#!/bin/bash

RETRY_FILE="/tmp/vm-watchdog-retries"
LOG_PREFIX="vm-watchdog"
QM_START_TIMEOUT=60  # seconds before we consider qm start hung

log() {
    echo "$(date): $1"
    logger -t "$LOG_PREFIX" "$1"
}

kill_stuck_qmstart() {
    local vm_id=$1
    local stuck_pids
    stuck_pids=$(ps aux | grep "task UPID.*qmstart:${vm_id}:" | grep -v grep | awk '{print $2}')
    if [ -n "$stuck_pids" ]; then
        log "VM $vm_id - Found stuck qmstart task(s): $stuck_pids — killing"
        kill -9 $stuck_pids 2>/dev/null
        sleep 2
        return 0
    fi
    return 1
}

is_qemu_running() {
    local vm_id=$1
    pgrep -f "kvm.*-id ${vm_id} " > /dev/null 2>&1
}

start_vm() {
    local vm_id=$1
    local lock_file="/var/lock/qemu-server/lock-${vm_id}.conf"
    local retry_count=0

    # --- Step 1: Check if QEMU is actually running at the process level ---
    if is_qemu_running $vm_id; then
        local qm_status
        qm_status=$(/usr/sbin/qm status $vm_id 2>/dev/null)
        if echo "$qm_status" | grep -q "running"; then
            sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
            return 0
        else
            log "VM $vm_id - QEMU process exists but Proxmox shows '$qm_status' — checking for stuck tasks"
            kill_stuck_qmstart $vm_id
            sleep 5
            if /usr/sbin/qm status $vm_id 2>/dev/null | grep -q "running"; then
                log "VM $vm_id - Proxmox now shows running after cleanup"
                sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
                return 0
            fi
        fi
    fi

    # --- Step 2: Check if VM is stopped per Proxmox ---
    if ! /usr/sbin/qm status $vm_id 2>/dev/null | grep -q "stopped"; then
        sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
        return 0
    fi

    # --- Step 3: Kill any stuck qmstart tasks before doing anything else ---
    if kill_stuck_qmstart $vm_id; then
        log "VM $vm_id - Killed stuck qmstart task(s), clearing state"
        rm -f $lock_file
        sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
        sleep 3
    fi

    # --- Step 4: Handle lock file ---
    if [ -f "$lock_file" ]; then
        if fuser "$lock_file" > /dev/null 2>&1; then
            retry_count=$(grep "^vm${vm_id}=" $RETRY_FILE 2>/dev/null | cut -d'=' -f2)
            retry_count=${retry_count:-0}
            retry_count=$((retry_count + 1))

            sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
            echo "vm${vm_id}=${retry_count}" >> $RETRY_FILE

            log "VM $vm_id - Lock held by active process (attempt $retry_count of 2)"

            if [ $retry_count -ge 2 ]; then
                log "VM $vm_id - 2 failed attempts, forcing lock removal"
                rm -f $lock_file
                sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
            else
                return 1
            fi
        else
            log "VM $vm_id - Removing stale lock"
            rm -f $lock_file
            sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
        fi
    fi

    # --- Step 5: Start the VM with a timeout so we never hang ---
    log "VM $vm_id - Starting"
    local start_output
    start_output=$(timeout $QM_START_TIMEOUT /usr/sbin/qm start $vm_id 2>&1)
    local exit_code=$?

    if [ $exit_code -eq 124 ]; then
        log "VM $vm_id - qm start timed out after ${QM_START_TIMEOUT}s — will retry next cycle"
        kill_stuck_qmstart $vm_id
        return 1
    elif echo "$start_output" | grep -qi "already running"; then
        log "VM $vm_id - Already running (caught by qm start output)"
        sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
        return 0
    elif [ $exit_code -ne 0 ]; then
        log "VM $vm_id - Start failed (exit $exit_code): $start_output"
        return 1
    else
        log "VM $vm_id - Started successfully"
        sed -i "/^vm${vm_id}=/d" $RETRY_FILE 2>/dev/null
        return 0
    fi
}

# Create retry file if it doesn't exist
touch $RETRY_FILE

# Use a lock so only one instance of this script runs at a time
exec 9>/tmp/vm-watchdog.lock
if ! flock -n 9; then
    log "Another instance of vm-watchdog is already running — exiting"
    exit 1
fi

# --- Update these VM IDs to match your Veeam worker VMs ---
start_vm 100
start_vm 101


Step 2: Set permissions

Code:
chmod +x /usr/local/bin/vm-watchdog.sh

Step 3: Verify syntax

Code:
bash -n /usr/local/bin/vm-watchdog.sh && echo "Syntax OK"

Step 4: Test run

Code:
bash /usr/local/bin/vm-watchdog.sh

Step 5: Set up the cron job

Code:
crontab -e

Add this line: In my case, I have it run every hour. You can run it longer or shorter, but for me, a run every hour gives enough time for the backup to finish.
Code:
# Hourly watchdog - checks and restarts VMs if stopped, cleans stale locks
0 * * * * /usr/local/bin/vm-watchdog.sh


Step 6: Monitor the logs

Code:
journalctl -t vm-watchdog --since "today"



What the script handles​


  • Stale lock files — detects and removes them automatically
  • Zombie qmstart tasks — finds and kills stuck pvedaemon workers that hold locks
  • QEMU/Proxmox state mismatch — detects when QEMU is running but Proxmox doesn't know about it
  • Hung qm start — enforces a 60 second timeout so the script never blocks indefinitely
  • Overlapping runs — uses flock to prevent multiple instances running simultaneously
  • Logging — writes to both stdout and syslog for easy monitoring

Hope this saves someone else the headache.
 
Last edited: