ERROR: Backup of VM <>failed - Node 'drive-virtio0' is busy: block device is in use by block job:

MartinZie

New Member
Jun 26, 2018
3
0
1
42
Hi,

I need some help- we are not abled to backup our VMs and Containers. For some reason backup process stops at some point and does not continue.

That's the Log:
-Backup stopped at 23%- I stopped it manually after several hours...

Task viewer: Backup

OutputStatus

Stop
INFO: starting new backup job: vzdump 107 --compress lzo --storage local --mode snapshot --node ariel3
INFO: Starting Backup of VM 107 (qemu)
INFO: status = running
INFO: update VM 107: -lock backup
INFO: VM Name: titan
INFO: include disk 'ide0' 'keeprbd:vm-107-disk-1' 50G
INFO: include disk 'virtio0' 'keeprbd:vm-107-disk-2' 400G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-107-2018_07_22-23_13_23.vma.lzo'
INFO: started backup task '67a28ad7-863a-45a2-97a0-2a34786bebfe'
INFO: status: 0% (778633216/483183820800), sparse 0% (476749824), duration 3, read/write 259/100 MB/s
INFO: status: 1% (5500239872/483183820800), sparse 0% (3599241216), duration 19, read/write 295/99 MB/s
INFO: status: 2% (9803268096/483183820800), sparse 1% (7525838848), duration 28, read/write 478/41 MB/s
INFO: status: 3% (14871953408/483183820800), sparse 2% (12471676928), duration 36, read/write 633/15 MB/s
INFO: status: 4% (19894697984/483183820800), sparse 3% (17262485504), duration 44, read/write 627/28 MB/s
INFO: status: 5% (24825954304/483183820800), sparse 4% (22168535040), duration 51, read/write 704/3 MB/s
INFO: status: 6% (29135536128/483183820800), sparse 4% (22929035264), duration 72, read/write 205/169 MB/s
INFO: status: 7% (33990377472/483183820800), sparse 5% (25684680704), duration 91, read/write 255/110 MB/s
INFO: status: 8% (38850920448/483183820800), sparse 5% (27206504448), duration 113, read/write 220/151 MB/s
INFO: status: 9% (43556339712/483183820800), sparse 6% (29629091840), duration 130, read/write 276/134 MB/s
INFO: status: 10% (48502276096/483183820800), sparse 6% (31348297728), duration 159, read/write 170/111 MB/s
INFO: status: 11% (54051471360/483183820800), sparse 7% (36365295616), duration 170, read/write 504/48 MB/s
INFO: status: 12% (58047332352/483183820800), sparse 7% (38584745984), duration 188, read/write 221/98 MB/s
INFO: status: 13% (62860492800/483183820800), sparse 8% (41522401280), duration 206, read/write 267/104 MB/s
INFO: status: 14% (67654975488/483183820800), sparse 8% (43381915648), duration 230, read/write 199/122 MB/s
INFO: status: 15% (72597176320/483183820800), sparse 9% (44818755584), duration 256, read/write 190/134 MB/s
INFO: status: 16% (77534593024/483183820800), sparse 9% (46654148608), duration 279, read/write 214/134 MB/s
INFO: status: 17% (82156716032/483183820800), sparse 9% (47831597056), duration 308, read/write 159/118 MB/s
INFO: status: 18% (87025909760/483183820800), sparse 10% (49060966400), duration 335, read/write 180/134 MB/s
INFO: status: 19% (91971584000/483183820800), sparse 10% (49986768896), duration 367, read/write 154/125 MB/s
INFO: status: 20% (96727400448/483183820800), sparse 10% (50781589504), duration 397, read/write 158/132 MB/s
INFO: status: 21% (101582766080/483183820800), sparse 11% (53600915456), duration 414, read/write 285/119 MB/s
INFO: status: 22% (106304962560/483183820800), sparse 11% (54811701248), duration 435, read/write 224/167 MB/s
INFO: status: 23% (111283732480/483183820800), sparse 11% (56046559232), duration 460, read/write 199/149 MB/s
ERROR: interrupted by signal
INFO: aborting backup job
ERROR: Backup of VM 107 failed - interrupted by signal
ERROR: Backup job failed - interrupted by signal

TASK ERROR: interrupted by signal
####
Trying to start backup manually I got this message:

NFO: starting new backup job: vzdump 107 --remove 0 --mode snapshot --compress lzo --storage keepbackup --node ariel3
INFO: Starting Backup of VM 107 (qemu)
INFO: status = running
INFO: update VM 107: -lock backup
INFO: VM Name: titan
INFO: include disk 'ide0' 'keeprbd:vm-107-disk-1' 50G
INFO: include disk 'virtio0' 'keeprbd:vm-107-disk-2' 400G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: pending configuration changes found (not included into backup)
INFO: creating archive '/srv/backup/dump/vzdump-qemu-107-2018_07_23-08_10_11.vma.lzo'
ERROR: Node 'drive-virtio0' is busy: block device is in use by block job: backup
INFO: aborting backup job
ERROR: Backup of VM 107 failed - Node 'drive-virtio0' is busy: block device is in use by block job: backup
INFO: Backup job finished with errors
TASK ERROR: job errors

-> Virtual Environment 5.2-2

Can anybody help? Backup location is local or storage..
Thanks,
Martin
 
Got same problem: backup stuck, I've cancel it and now when I've trying backup VM again I've got error:

Code:
ERROR: Node 'drive-virtio0' is busy: block device is in use by block job: backup
INFO: aborting backup job
ERROR: Backup of VM 699 failed - Node 'drive-virtio0' is busy: block device is in use by block job: backup
INFO: Backup job finished with errors
TASK ERROR: job errors
-------------------------------
qm unlock 699 doesn't help.

pve-manager/5.2-6/bcd5f008 (running kernel: 4.15.18-1-pve)


UPD:
I've tried live migration to another node and it stuck at the end... Aboring again:
Code:
2018-08-04 17:59:32 starting migration of VM 699 to node 'node-3' (x.x.x.x)
2018-08-04 17:59:32 copying disk images
2018-08-04 17:59:32 starting VM 699 on remote node 'node-3'
2018-08-04 17:59:35 start remote tunnel
2018-08-04 17:59:35 ssh tunnel ver 1
2018-08-04 17:59:35 starting online/live migration on unix:/run/qemu-server/699.migrate
2018-08-04 17:59:35 migrate_set_speed: 8589934592
2018-08-04 17:59:35 migrate_set_downtime: 0.1
2018-08-04 17:59:35 set migration_caps
2018-08-04 17:59:35 set cachesize: 1073741824
2018-08-04 17:59:35 start migrate command to unix:/run/qemu-server/699.migrate
2018-08-04 17:59:36 migration status: active (transferred 98670789, remaining 10594033664), total 10755055616)
2018-08-04 17:59:36 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
......
2018-08-04 18:01:02 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 154814 overflow 0
2018-08-04 18:01:02 migration status: active (transferred 10146249824, remaining 59158528), total 10755055616)
2018-08-04 18:01:02 migration xbzrle cachesize: 1073741824 transferred 580518 pages 3201 cachemiss 157556 overflow 24
2018-08-04 18:01:02 migration status: active (transferred 10158147745, remaining 26271744), total 10755055616)
2018-08-04 18:01:02 migration xbzrle cachesize: 1073741824 transferred 801508 pages 4695 cachemiss 160399 overflow 26
query migrate failed: interrupted by signal

2018-08-04 18:02:59 query migrate failed: interrupted by signal


Now KVM stuck - I cann't connect to OS.

I've trying to shutdown:
Code:
TASK ERROR: VM quit/powerdown failed

Then I've trying to stop:
Code:
VM quit/powerdown failed - terminating now with SIGTERM
VM still running - terminating now with SIGKILL
rbd: sysfs write failed
can't unmap rbd volume vm-699-disk-1: rbd: sysfs write failed
volume deactivation failed: ssd-ceph:vm-699-disk-1 at /usr/share/perl5/PVE/Storage.pm line 1003.
TASK OK

Now if I've trying to start:
Code:
rbd: sysfs write failed
can't unmap rbd volume vm-699-disk-1: rbd: sysfs write failed
TASK ERROR: start failed: org.freedesktop.systemd1.UnitExists: Unit 699.scope already exists.

-------------------
Code:
root@node-4:~# find / -name 699.scope
/var/lib/lxcfs/cgroup/pids/qemu.slice/699.scope
/var/lib/lxcfs/cgroup/cpu,cpuacct/qemu.slice/699.scope
/var/lib/lxcfs/cgroup/name=systemd/qemu.slice/699.scope
find: ‘/proc/127393/task/127393/net’: Invalid argument
find: ‘/proc/127393/task/127450/net’: Invalid argument
find: ‘/proc/127393/net’: Invalid argument
/run/systemd/transient/699.scope
/sys/fs/cgroup/pids/qemu.slice/699.scope
/sys/fs/cgroup/cpu,cpuacct/qemu.slice/699.scope
/sys/fs/cgroup/systemd/qemu.slice/699.scope

Image had no lock.
Code:
rbd -p ssd-pool lock ls --image vm-699-disk-1
show nothing

I've moved image to another name, and edit qemu config file
Code:
root@node-4:~# rbd -p ssd-pool mv vm-699-disk-1 vm-24699-disk-1

Now it start and I can backup it.
I've got this problem on few VM on one node. How I can "unlock" VM without stop/moving disk?
 
Last edited:
There are two ways to unlock VM without moving disk:
1. qm unlock >vmid>
2. Remove manually file from /var/lock/qemu-server/lock-<vmid>.conf
 
Removing /var/lock/qemu-server/lock-<vmid>.conf doesn't help.
Some VM's still can't finish backup and got same error when trying stop/start VM.
Also I've tried migrate disk (both: live and offline) to another ceph pool and it even doesn't start.
 
After qm unlock <vmid>, the /var/lock/qemu-server/lock-<vmid>.conf file is created, even after removing it I have no control of backups of this machine which backup was stalled and I manually interrupted it:

NFO: starting new backup job:
INFO: Starting Backup of VM <vmid> (qemu)
INFO: status = running
INFO: update VM <vmid>: -lock backup
INFO: VM Name: MyServer
INFO: include disk 'virtio0' 'poolname:vm-<vmid>-disk-0' 300G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive 'file.vma.gz'
ERROR: Node 'drive-virtio0' is busy: block device is in use by block job: backup
INFO: aborting backup job
ERROR: Backup of VM <vmid> failed - Node 'drive-virtio0' is busy: block device is in use by block job: backup
INFO: Backup job finished with errors
TASK ERROR: job errors

I suppose it can happen if vm during backup is under heavy maintenance, but how to exit this backup finally without rebooting the proxmox server?

I have found the similar problem with containers, but I have only vms:
https://bugzilla.proxmox.com/show_bug.cgi?id=1911

After vm is stopped, I have another error:
rbd: sysfs write failed
can't unmap rbd device /dev/rbd/<poolname>/vm-<vmid>-disk-1: rbd: sysfs write failed
ERROR: Backup of VM <vmid> failed - start failed: org.freedesktop.systemd1.UnitExists: Unit <vmid>.scope already exists.

Of course, reboot of the proxmox server helps, but this is not a solution.


 
Last edited:
Maybe it stopped, it should be tested for some time, after it I will reply.
After some backups, but only with a few clients (I just was cautious after many failures), I have no single error.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!