[SOLVED] Cannot suspend VM to disk due to passed-through PCI device(s)

ukro

Member
May 16, 2021
125
13
23
39
Yello!
What are my options?
1VM have passed Nvidia GPU
1VM have passed google coral

ERROR:
Code:
cannot suspend VM to disk due to passed-through PCI device(s), which lack the possibility to save/restore their internal state
 
Last edited:
You gave me idea!
will try to snapshot and just stop it.
Shutdown not possible,due to possible unsaved file loss
 
Works like a charm
Bash:
qm snapshot [vmid] beforeclose --vmstate true
qm stop [vmid]
qm rollback [vmid] beforeclose
#qm listsnapshot [vmid]
qm delsnapshot [vmid] beforeclose
 
Hi @ukro

I want to implement your idea as a hookscipt "pre-stop" and "pre-start" so instead of shutting down I can always keep unsaved files or session.
However, I am getting this message when trying to snapshot at "pre-stop" and end up not doing it...
trying to acquire lock...

TASK ERROR: can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout
Any suggestion? how are your executing these commands? manually somehow?

Thank you,
 
Hi @ukro

I want to implement your idea as a hookscipt "pre-stop" and "pre-start" so instead of shutting down I can always keep unsaved files or session.
However, I am getting this message when trying to snapshot at "pre-stop" and end up not doing it...

Any suggestion? how are your executing these commands? manually somehow?

Thank you,
Hi, this is my notes, code might not work i am planning to rewrite it with the CT to work properly and wait for samba to be available before resuming VMs.
Code:
#@reboot python2.7 /root/main.py &
import os
import sys
import time
import logging
from subprocess import Popen, PIPE
from logging.handlers import RotatingFileHandler
import commands
#VMSNUMBERS
#RESUME IS REVERSED
#START IS BY ORDER
HOSTS_TO_CHECK = ["192.168.6.153"]
#CT_TO_CHECK = ["300","100"]#order to stop
#VMS_TO_CHECK = ["200","110","105","104","103","102","101"]#order to hibernate
CTS_TO_CHECK = ["100"]#order to stop
VMS_TO_CHECK = ["102","101"]#order to hibernate


logfile_path = "/root/main.log"
REQUIRED_OFFLINE_SECONDS = 300
#600
REQUIRED_ONLINE_SECONDS = 150
#300
POLLING_INTERVAL_SECONDS = 30
WASHIBERNATED="FIRSTSTART"
assert REQUIRED_OFFLINE_SECONDS > 2*POLLING_INTERVAL_SECONDS


def main():
        global WASHIBERNATED
        if exit_if_any_host_up()==True:
                log.info("Router is reachable. Poll again, every %s s.",POLLING_INTERVAL_SECONDS)
                deadline = time.time() + REQUIRED_ONLINE_SECONDS
                deadline_str = time.strftime("%H:%M:%S", time.localtime(deadline))
                while time.time() < deadline:
                        log.info('Invoke ONLINE START if host is up until %s.', deadline_str)
                        time.sleep(POLLING_INTERVAL_SECONDS)
                        if exit_if_any_host_up()==False:
                                break
                if exit_if_any_host_up()==False:
                        log.info('Router is UNREACHABLE breaking from loop')
                        log.info('Router is UNREACHABLE breaking from loop')
                        log.info('Router is UNREACHABLE breaking from loop')
                elif WASHIBERNATED=="RUNNING":
                        log.debug('Not starting because Was not hibernated or already started')
                else:#HOST IS UP
                        if time.time() >= deadline:
                                log.info('Invoking ONLINE START')
                                if AreVolumesAvailable():#IS UNENCRYPTED
                                        #Y
                                        log.info('Volumes are Available')
                                        #pct stop CTID
                                        for CTS in reversed(CTS_TO_CHECK):
                                                if exit_if_any_host_up()==True:
                                                        log.info("'START CT %s' returncode: %s" % (CTS,run_subprocess(['/usr/sbin/pct', 'start',CTS])))
                                                else:
                                                        log.error('ERROR CT %s STOPED FROM RUNNING, lost ping.' % CTS)
                                        time.sleep(5)
                                        for VMS in reversed(VMS_TO_CHECK):
                                                if exit_if_any_host_up()==True:


                                                        log.info("'resume VM %s' returncode: %s" % (VMS,run_subprocess(['/usr/sbin/qm', 'resume',VMS])))
                                                        #time.sleep(POLLING_INTERVAL_SECONDS)


                                                else:
                                                        log.error('ERROR VM %s STOPED FROM RUNNING, lost ping.' % VMS)

                                        WASHIBERNATED="RUNNING"
                                        log.info('ALL VMS should be resumed if no error above.')





                                else:
                                        #N
                                        log.info('Volumes are UNAVAILABLE, mounting volumes')
                                      

        else:#ROUTER IS NOT PINGING
                log.info("Router is UNREACHABLE. Poll again, every %s s.",POLLING_INTERVAL_SECONDS)
                deadline = time.time() + REQUIRED_OFFLINE_SECONDS
                deadline_str = time.strftime("%H:%M:%S", time.localtime(deadline))
                while time.time() < deadline:
                        log.info('Invoke shutdown if no host comes up until %s.', deadline_str)
                        time.sleep(POLLING_INTERVAL_SECONDS)
                        if exit_if_any_host_up()==True:
                                break
                if exit_if_any_host_up()==True:
                        log.info('Router is reachable breaking from loop')
                elif WASHIBERNATED=="FIRSTSTART":
                        log.debug('Not suspending because first start waiting for ping')
                else:
                        if time.time() >= deadline:

                                for CTS in CTS_TO_CHECK:
                                        if IsCtRunning(CTS):
                                                log.info("'Suspending CT %s' " % CTS)
                                                log.info("'suspend %s' returncode: %s" % (CTS,run_subprocess(['/usr/sbin/pct', 'stop',CTS])))
                                        else:
                                                #N
                                                log.info("'Not running CT %s' " % CTS)

                                for VMS in VMS_TO_CHECK:
                                        if IsVmRunning(VMS):
                                                #Y
                                                log.info("'Suspending VM %s' " % VMS)
                                                log.info("'suspend %s' returncode: %s" % (VMS,run_subprocess(['/usr/sbin/qm', 'suspend',VMS,'--todisk','1'>
                                        else:
                                                #N
                                                log.info("'Not running VM %s' " % VMS)

                                if WASHIBERNATED=="SUSPENDED":
                                        #IS SUSPENDED U CAN SHUTDOWN
                                        log.warning("Shutting down HOST............................")
                                        try:
                                                log.warning("Shutting down")
                                                #os.system("sudo shutdown now &")
                                        except:
                                                log.error("Some error during shutting down")

                                else:
                                        #NOT SUSPENDED YET,MAKE SUSPENDED

                                        WASHIBERNATED="SUSPENDED"############# NEWLY ADDED NEED TO PUT IT
                                        log.info("All VMS should be suspended or stopped")

def AreVolumesAvailable():
        return True



def IsVmRunning(vm):
        answer = commands.getoutput("/usr/sbin/qm status %s" % vm)
        time.sleep(4)
        if "running" in answer:
                log.info("'Running VM %s' " % vm)
                return True
        else:
                log.info("'Not running VM %s' " % vm)
                return False

def IsCtRunning(ct):
        answer = commands.getoutput("/usr/sbin/pct status %s" % ct)
        time.sleep(4)
        if "running" in answer:
                log.info("'Running CT %s' " % ct)
                return True
        else:
                log.info("'Not running CT %s' " % ct)
                return False


def exit_if_any_host_up():
        log.info("Pinging router, break if one is up.")
        for host in HOSTS_TO_CHECK:
                if host_responding(host):
                        log.info("Exit checking host if it's up.")
                        return True
                else:
                        return False
            #sys.exit(0)


def host_responding(host):
    #log.info("Pinging host '%s'...", host)
    rc = run_subprocess(['ping', '-q','-c','1','-w', '2',  host])
    if not rc:
        log.info("Ping returned with code 0, host is up.")
        return True
    log.info("Ping returned with code %s, host is down.", rc)
    return False


def run_subprocess(cmdlist):
    #log.debug("Calling Popen(%s).", cmdlist)
    try:
        sp = Popen(cmdlist, stdout=PIPE, stderr=PIPE)
        out, err = sp.communicate()
    except OSError as e:
        log.error("OSError while executing subprocess. Error message:\n%s" % e)
        sys.exit(1)
    #~ if out:
        #~ log.debug("Subprocess stdout:\n%s", out)
    if err:
        log.debug("Subprocess stderr:\n%s", err)
    return sp.returncode



if __name__ == "__main__":
        log = logging.getLogger()
        log.setLevel(logging.DEBUG)
        ch = logging.StreamHandler()
        fh = RotatingFileHandler(
        logfile_path,
        mode='a',
        maxBytes=500*1024,
        backupCount=20,
        encoding='utf-8')
        formatter = logging.Formatter('%(asctime)s - %(levelname)s %(funcName)s(%(lineno)d) - %(message)s')
        ch.setFormatter(formatter)
        fh.setFormatter(formatter)
        log.addHandler(ch)
        log.addHandler(fh)
        log.info('------------------------------------')
        log.info('---------------NEW START------------')
        log.info('------------------------------------')
        while True:
                time.sleep(POLLING_INTERVAL_SECONDS)
                main()
 
Last edited:
Wow this is way beyond expected hehe. Thanks for sharing.
However I would prefer something simpler but looks using hookscripts wont ever work bc lock vms and simple but effective commands you brought requires vms not being locked.
Anyway, it is not big deal but will continue searching over there if I can find something else.
Thank you!
 
  • Like
Reactions: ukro

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!