Problem with restore VM.

m4rek11

Member
Jan 3, 2020
27
1
8
34
Hello Guys!

I have problem to restore my near 4.5 TB VM. I would like to restore it to new VM ID, so I go to my NFS store (freenas - Usage 65.59% (18.09 TiB of 27.58 TiB)) where backup is located. From content I click restore and I select storage and VM ID. My storage to restore is another NFS store (freenas - Usage 65.59% (18.09 TiB of 27.58 TiB)). After few moment I got error:

restore vma archive: lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_09-23_30_02.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp92872.fifo - /var/tmp/vzdumptmp92872
CFG: size: 1058 name: qemu-server.conf
DEV: dev_id=1 size: 1793148846080 devname: drive-virtio0
DEV: dev_id=2 size: 1073741824000 devname: drive-virtio1
DEV: dev_id=3 size: 1073741824000 devname: drive-virtio2
DEV: dev_id=4 size: 1073741824000 devname: drive-virtio3
CTIME: Thu Jan 9 23:36:45 2020
no lock found trying to remove 'create' lock
TASK ERROR: command 'set -o pipefail && lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_09-23_30_02.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp92872.fifo - /var/tmp/vzdumptmp92872' failed: error with cfs lock 'storage-Store-FreeNas2-VolExternal01': unable to create image: got lock timeout - aborting command


I can count on some hints from You? I will be greatfull for any advices. If you need any other information, please, let me know.

Your faithfully,
Marek.
 
hi,

try on the CLI: qm unlock VMID and try again
 
Thank you for sugestion, but unfortunetly unlock vm doesn't work - still the same error.

Your faithfully,
Marek.
 
i've misread, my bad.

it seems like the cluster file system (cfs) is unable to lock the file.

* is your cluster quorate? (check with pvecm status)

* anything in journalctl while this is going on? (do journalctl -f while letting it run again)
 
Thanky you for reply.

I attached journalctl from last 24 hours, but I found nothing intreset.

Command: pvecm status

Quorum information
------------------
Date: Tue Feb 4 12:12:09 2020
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/332
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.16.101 (local)
0x00000002 1 192.168.16.104
 

Attachments

  • journal.zip
    130.3 KB · Views: 3
Hi,
can you restore any other VM on this storage? Can you restore the VM to any other storage (or other VMs to other storages)?

In the journal I found
Code:
Feb 04 06:59:54 ITsky-prox-01 sudo[16553]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 06:59:55 ITsky-prox-01 sudo[16553]: pam_unix(sudo:session): session closed for user root
Feb 04 06:59:55 ITsky-prox-01 sudo[16557]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pvesm list Store-FreeNas1
Feb 04 06:59:55 ITsky-prox-01 sudo[16557]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 06:59:56 ITsky-prox-01 sudo[16557]: pam_unix(sudo:session): session closed for user root
Feb 04 07:00:00 ITsky-prox-01 systemd[1]: Starting Proxmox VE replication runner...
Feb 04 07:00:01 ITsky-prox-01 systemd[1]: Started Proxmox VE replication runner.
Feb 04 07:00:01 ITsky-prox-01 sshd[16667]: Connection closed by 172.16.102.3 port 55282 [preauth]
Feb 04 07:00:02 ITsky-prox-01 pvedaemon[44681]: <root@pam> successful auth for user 'monitoring@pve'
Feb 04 07:00:05 ITsky-prox-01 pvedaemon[44681]: <root@pam> successful auth for user 'monitoring@pve'
Feb 04 07:00:09 ITsky-prox-01 pvedaemon[44681]: <root@pam> starting task UPID:ITsky-prox-01:00004136:30A9C21D:5E390869:qmrestore:144:root@pam:
Feb 04 07:00:28 ITsky-prox-01 sudo[16774]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/qm list
Feb 04 07:00:28 ITsky-prox-01 sudo[16774]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:00:29 ITsky-prox-01 sudo[16774]: pam_unix(sudo:session): session closed for user root
Feb 04 07:00:29 ITsky-prox-01 sudo[16783]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pvesm list Store-FreeNas1
Feb 04 07:00:29 ITsky-prox-01 sudo[16783]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:00:29 ITsky-prox-01 sudo[16783]: pam_unix(sudo:session): session closed for user root
Feb 04 07:00:37 ITsky-prox-01 pvedaemon[49195]: <root@pam> successful auth for user 'monitoring@pve'
Feb 04 07:00:40 ITsky-prox-01 sudo[16842]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/qm list
Feb 04 07:00:40 ITsky-prox-01 sudo[16842]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:00:40 ITsky-prox-01 sudo[16851]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/qm list
Feb 04 07:00:40 ITsky-prox-01 sudo[16851]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:00:41 ITsky-prox-01 sudo[16842]: pam_unix(sudo:session): session closed for user root
Feb 04 07:00:41 ITsky-prox-01 sudo[16855]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pvesm list Store-FreeNas2-VolExternal01
Feb 04 07:00:41 ITsky-prox-01 sudo[16855]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:00:41 ITsky-prox-01 sudo[16851]: pam_unix(sudo:session): session closed for user root
Feb 04 07:00:41 ITsky-prox-01 sudo[16858]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pvesm list Store-FreeNas2-VolExternal01
Feb 04 07:00:41 ITsky-prox-01 sudo[16858]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:00:41 ITsky-prox-01 sudo[16855]: pam_unix(sudo:session): session closed for user root
Feb 04 07:00:41 ITsky-prox-01 sudo[16858]: pam_unix(sudo:session): session closed for user root
Feb 04 07:00:43 ITsky-prox-01 pvedaemon[49195]: <root@pam> successful auth for user 'monitoring@pve'
Feb 04 07:01:00 ITsky-prox-01 systemd[1]: Starting Proxmox VE replication runner...
Feb 04 07:01:01 ITsky-prox-01 systemd[1]: Started Proxmox VE replication runner.
Feb 04 07:01:08 ITsky-prox-01 sudo[17010]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/qm list
Feb 04 07:01:08 ITsky-prox-01 sudo[17010]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:01:08 ITsky-prox-01 pvedaemon[41660]: <root@pam> successful auth for user 'monitoring@pve'
Feb 04 07:01:09 ITsky-prox-01 sudo[17010]: pam_unix(sudo:session): session closed for user root
Feb 04 07:01:09 ITsky-prox-01 sudo[17019]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/pvesm list Store-FreeNas1
Feb 04 07:01:09 ITsky-prox-01 sudo[17019]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:01:09 ITsky-prox-01 pvedaemon[16694]: no lock found trying to remove 'create'  lock
Feb 04 07:01:09 ITsky-prox-01 pvedaemon[16694]: command 'set -o pipefail && lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_09-23_30_02.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp16694.fifo - /var/tmp/vzdumptmp16694' failed: error with cfs lock 'storage-Store-FreeNas2-Vol1': unable to create image: got lock timeout - aborting command
Feb 04 07:01:09 ITsky-prox-01 pvedaemon[44681]: <root@pam> end task UPID:ITsky-prox-01:00004136:30A9C21D:5E390869:qmrestore:144:root@pam: command 'set -o pipefail && lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_09-23_30_02.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp16694.fifo - /var/tmp/vzdumptmp16694' failed: error with cfs lock 'storage-Store-FreeNas2-Vol1': unable to create image: got lock timeout - aborting command
Feb 04 07:01:09 ITsky-prox-01 sudo[17019]: pam_unix(sudo:session): session closed for user root
Feb 04 07:01:18 ITsky-prox-01 pvedaemon[49195]: <root@pam> successful auth for user 'monitoring@pve'
Feb 04 07:01:39 ITsky-prox-01 sudo[17135]:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/qm list
Feb 04 07:01:39 ITsky-prox-01 sudo[17135]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 04 07:01:40 ITsky-prox-01 sudo[17135]: pam_unix(sudo:session): session closed for user root
Maybe the monitoring is the reason that the restore process cannot acquire the lock?
 
Thank you for reply. Any other VMs are correctly restore on the any of my storage. I don't think so that monitoring cause the problem. I have problem only with that one VM - this is the biggest vm ~4.5 TB.

Your faithfully,
Marek.
 
I think I found the reason for the timeout. When locking the storage for the allocate operation with the CFS lock, the operation has 60 seconds to complete. And since you have such a huge disk, it likely doesn't finish in time.

Proceed at your own risk!
As a workaround, you could temporarily set the storage as not shared to fool PVE into thinking that it doesn't need a cluster-wide lock. Please make sure you're not doing any other alloc/move/clone disk operations during that time on this storage. And don't forget to set the storage as shared again afterwards (can be done as soon as all the disks for the VM are present on the storage).
 
Thank you for reply. What do you mean about "set the storage as not shared to fool PVE into thinking that it doesn't need a cluster-wide lock"? Could you tell more detailed about that? I will be greatful.

Your faithfully,
Marek.
 
In /etc/pve/storage.cfg you can modify the option shared 1 to shared 0 below the relevant storage. Again, make sure to not do any other storage operations with PVE and change it back afterwards.
 
On local storages I have that option "shared", but storage is nfs, where that option isn't listed, so should I add it? Is that matter? Maybe should I setup something in freenas, where that storage is store?

Your faithfully,
Marek.
 
On local storages I have that option "shared", but storage is nfs, where that option isn't listed, so should I add it? Is that matter? Maybe should I setup something in freenas, where that storage is store?

Your faithfully,
Marek.

Yes, you could try that. It might not work (maybe PVE has a check against declaring NFS storages as non-shared), but here's an alternative way which might be preferred anyways (at the cost of creating a new storage):
  1. Create a new directory based storage (are non-shared by default) with a directory in the NFS and restrict it to the node by adding
    Code:
    nodes <NODENAME>
    as an option (or via the GUI). Then it's really a non-shared directory from PVEs point of view, since other nodes won't touch it.
  2. Restore the VM.
  3. Now you can remove the restriction to the node and if you do, declare it as shared.
 
Hello again.

I have added nfs storage as new direstory storage with "shared 0" as you said. My file storage.cfg is:

nfs: Store-FreeNas2-Vol1
export /mnt/VOL1/ProxMox/Cluster01-Store02
path /mnt/pve/Store-FreeNas2-Vol1
server m.y.i.p
content backup,images,iso,vztmpl,rootdir
maxfiles 5
options vers=4.1

dir: TestRecovery
path /mnt/pve/Store-FreeNas2-Vol1
content images,iso,backup,rootdir,vztmpl
maxfiles 5
nodes Prox-01
shared 0


For test I recovered few small size (< 50 GB, even one ~120 GB) VM on "TestRecovery" storage and is ok, but larger VMS ( > 400 GB) I get the same error about timeout, but it shows much later.

For sure, I have added too "shared 0" to nfs storage:

nfs: Store-FreeNas2-Vol1
export /mnt/VOL1/ProxMox/Cluster01-Store02
path /mnt/pve/Store-FreeNas2-Vol1
server m.y.i.p
content backup,images,iso,vztmpl,rootdir
maxfiles 5
options vers=4.1
shared 0

But it doesn't change anything, as you mentioned.


I will be greatful for any other tips. As you can see, problem is "strange" and the solution don't seem to be easy.

Your faithfully,
Marek.
 
You shouldn't create the directory storage directly on the mount point, please use a sub-directory instead. Could you share the full log when trying to restore on the directory storage?
 
I tried recovery to sub-directory, as you said, but it didnt't work too.

storage.cfg:
nfs: Store-FreeNas2-Vol1
export /mnt/VOL1/ProxMox/Cluster01-Store02
path /mnt/pve/Store-FreeNas2-Vol1
server m.y.i.p
content vztmpl,rootdir,images,iso,backup
maxfiles 5
options vers=4.1

dir: Recovery
path /mnt/pve/Store-FreeNas2-Vol1/mount
content images,iso,backup,rootdir,vztmpl
maxfiles 5
shared 0

Log of task:
restore vma archive: lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_17-00_58_42.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp46637.fifo - /var/tmp/vzdumptmp46637
CFG: size: 1112 name: qemu-server.conf
DEV: dev_id=1 size: 1793148846080 devname: drive-virtio0
DEV: dev_id=2 size: 1073741824000 devname: drive-virtio1
DEV: dev_id=3 size: 1073741824000 devname: drive-virtio2
DEV: dev_id=4 size: 1073741824000 devname: drive-virtio3
CTIME: Fri Jan 17 00:58:57 2020
no lock found trying to remove 'create' lock
TASK ERROR: command 'set -o pipefail && lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_17-00_58_42.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp46637.fifo - /var/tmp/vzdumptmp46637' failed: command '/usr/bin/qemu-img create -o 'preallocation=metadata' -f qcow2 /mnt/pve/Store-FreeNas2-Vol1/mount/images/150/vm-150-disk-0.qcow2 1751121920K' failed: got timeout

Do you need any other log file?

I am so confused about my problem. Is there anything what can I do to get rid of that problem?

Your faithfully,
Marek.
 
Last edited:
For test I recovered few small size (< 50 GB, even one ~120 GB) VM on "TestRecovery" storage and is ok, but larger VMS ( > 400 GB) I get the same error about timeout, but it shows much later.

How much time is "much later" approximately?

no lock found trying to remove 'create' lock
TASK ERROR: command 'set -o pipefail && lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_17-00_58_42.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp46637.fifo - /var/tmp/vzdumptmp46637' failed: command '/usr/bin/qemu-img create -o 'preallocation=metadata' -f qcow2 /mnt/pve/Store-FreeNas2-Vol1/mount/images/150/vm-150-disk-0.qcow2 1751121920K' failed: got timeout

Previously it complained about a CFS (cluster file system) lock, now it doesn't anymore. So the locking happens locally (you can check for the file /var/lock/pve-manager/pve-storage-Recovery, but normally there shouldn't be a timeout then.

Do you need any other log file?

What is the output of pveversion -v?
Could you also post the contents of the journal around the time of the attempted restore again?
 
How much time is "much later" approximately?

For now on that test storage "Recovery" VM ~500 GB recovery works ok - I don't have timeout error. Small machines start recover immediately, however larger ~500 I have to wait near ~10 minutes to start. Larger ~4.5 TB got timeout.


Previously it complained about a CFS (cluster file system) lock, now it doesn't anymore. So the locking happens locally (you can check for the file /var/lock/pve-manager/pve-storage-Recovery, but normally there shouldn't be a timeout then.

If I go to /var/lock/pve-manager/ file pve-storage-Recovery is empty. Should it be? File has 644 permisions.

What is the output of pveversion -v?
Could you also post the contents of the journal around the time of the attempted restore again?

Command pveversion -v results:
root@prox-01:~# pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-20-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-8
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-54
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-6
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

The result of journalctl -f command when the vm is recovering I attached.
 

Attachments

  • rec.log
    17.1 KB · Views: 3
It turns out that there is another timeout for 10 minutes for the disks to be allocated. Normally that shouldn't take long, for example running the same command on my machine
Code:
# /usr/bin/qemu-img create -o 'preallocation=metadata' -f qcow2 mytestdisk.qcow2 1751121920K                                                
Formatting 'mytestdisk.qcow2', fmt=qcow2 size=1793148846080 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
/usr/bin/qemu-img create -o 'preallocation=metadata' -f qcow2 mytestdisk.qcow  3.40s user 2.48s system 32% cpu 18.343 total
# qemu-img info mytestdisk.qcow2
image: mytestdisk.qcow2
file format: qcow2
virtual size: 1.63 TiB (1793148846080 bytes)
disk size: 261 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
Of course for you it's an NFS setup and so it's expected to take longer, but it shouldn't take that much longer. Do you have that much load on the NFS or could there be some other problem with the setup in general?

That said, it is possible to restore the VM manually without using the PVE API. Do the following:
  1. Run the following:
    Code:
    lzop -d -c /mnt/pve/Store-FreeNas2-VolExternal01/dump/vzdump-qemu-103-2020_01_17-00_58_42.vma.lzo | vma extract -v - /mnt/pve/Store-FreeNas2-VolExternal01/tmp_restore
  2. The directory /mnt/pve/Store-FreeNas2-VolExternal01/tmp_restore should now contain a file qemu-server.conf and the disks files disk-drive-virtio<N>.raw
  3. Run mkdir /mnt/pve/Store-FreeNas2-VolExternal01/images/<VMID> where VMID is an ID not used by any existing virtual machine.
  4. For each drive run
    Code:
    qemu-img convert -O qcow2 /mnt/pve/Store-FreeNas2-VolExternal01/tmp_restore/disk-drive-virtio<N>.raw /mnt/pve/Store-FreeNas2-VolExternal01/images/<VMID>/vm-<VMID>-disk-<M>.qcow2
    where M is the number so that the name matches the correct one in qemu-server.conf. For example, if you have a line
    Code:
    virtio1: TestRecovery:103/vm-103-disk-3.qcow2,size=4G
    in qemu-server.conf, then N is 1 and M is 3.
  5. Finally copy over the virtual machine configuration file with
    Code:
    cp /mnt/pve/Store-FreeNas2-VolExternal01/tmp_restore/qemu-server.conf /etc/pve/nodes/<NODE>/qemu-server/<VMID>.conf
    .
  6. Remove the #qmdump# lines from the configuration file.
  7. Remove the tmp_restore directory.
Of course if you run out of space, you can remove the disks that were already converted. Or if you want to use raw instead of qcow2 you can just copy/move the disks normally.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!