Can not Backup while Disk Throttle is set

nanonettr

Member
Jul 25, 2015
17
0
21
Hello,

I am using pve-manager/3.4-9/4b51d87a (running kernel: 3.10.0-11-pve) and testing Disk Throttle.
It runs pretty good. But have an issue with backups.

If virtual hard disk has limits; backup command starts then waits for at 0, never make a progress.
Code:
[COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump 99999 --remove 0 --mode stop --storage QNAPNFS --node node02[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 99999 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = stopped[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: update VM 99999: -lock backup[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: stop[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: bandwidth limit: 61500 KB/s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/QNAPNFS/dump/vzdump-qemu-99999-2015_09_16-23_28_27.vma'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: starting kvm to execute backup task[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: started backup task 'c9a6a287-37f7-417b-99c0-6cfb0ff56eba'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status: 0% (2162688/146028888064), sparse 0% (12288), duration 3, 0/0 MB/s[/FONT][/COLOR]


is this a known bug ? any workarounds ?

 
Hi

What is the output of qm config < your VM_ID > ?
 
Hi

What is the output of qm config < your VM_ID > ?


Code:
root@node02:~# qm config 99999
acpi: 1
boot: c
bootdisk: virtio0
cores: 4
cpuunits: 1000
description: Old Windows 2003
freeze: 0
hotplug: 0
ide2: none,media=cdrom
kvm: 1
memory: 6144
name: OldWindows2k3
net0: virtio=6A:6D:B1:4F:3E:7C,link_down=1
onboot: 0
ostype: wxp
sockets: 1
tablet: 0
virtio0: SANSSD:99999/vm-99999-disk-1.qcow2,format=qcow2,discard=on,iops_rd=70,iops_wr=70,iops_rd_max=350,iops_wr_max=350,mbps_rd=20,mbps_wr=20,mbps_rd_max=100,mbps_wr_max=100,size=34G
virtio1: SANSSD:99999/vm-99999-disk-2.qcow2,format=qcow2,discard=on,iops_rd=70,iops_wr=70,iops_rd_max=350,iops_wr_max=350,mbps_rd=20,mbps_wr=20,mbps_rd_max=100,mbps_wr_max=100,size=102G
 
I tried to find the problem and strace'd the proxmox backup task's pid. (it was 26074). it was looping like this;
Code:
stat("/etc/pve/nodes/node02/qemu-server/99999.conf", {st_dev=makedev(0, 22), st_ino=1101, st_mode=S_IFREG|0640, st_nlink=1, st_uid=0, st_gid=33, st_blksize=4096, st_blocks=1, st_size=697, st_atime=2015/09/17-18:08:54, st_mtime=2015/09/17-18:08:54, st_ctime=2015/09/17-18:08:54}) = 0open("/var/run/qemu-server/99999.pid", O_RDONLY) = 18
ioctl(18, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffeb415f940) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(18, 0, SEEK_CUR)                  = 0
fstat(18, {st_dev=makedev(0, 16), st_ino=89630958, st_mode=S_IFREG|0600, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=6, st_atime=2015/09/17-18:08:54, st_mtime=2015/09/17-18:08:54, st_ctime=2015/09/17-18:08:54}) = 0
fcntl(18, F_SETFD, FD_CLOEXEC)          = 0
fstat(18, {st_dev=makedev(0, 16), st_ino=89630958, st_mode=S_IFREG|0600, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=6, st_atime=2015/09/17-18:08:54, st_mtime=2015/09/17-18:08:54, st_ctime=2015/09/17-18:08:54}) = 0
read(18, "26117\n", 8192)               = 6
close(18)                               = 0
open("/proc/26117/cmdline", O_RDONLY)   = 18
ioctl(18, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffeb415f940) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(18, 0, SEEK_CUR)                  = 0
fstat(18, {st_dev=makedev(0, 3), st_ino=89636995, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2015/09/17-18:08:54, st_mtime=2015/09/17-18:08:54, st_ctime=2015/09/17-18:08:54}) = 0
fcntl(18, F_SETFD, FD_CLOEXEC)          = 0
read(18, "/usr/bin/kvm\0-id\00099999\0-chardev\0"..., 8192) = 1502
read(18, "", 8192)                      = 0
close(18)                               = 0
open("/proc/26117/stat", O_RDONLY)      = 18
ioctl(18, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffeb415f940) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(18, 0, SEEK_CUR)                  = 0
fstat(18, {st_dev=makedev(0, 3), st_ino=89636996, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2015/09/17-18:08:54, st_mtime=2015/09/17-18:08:54, st_ctime=2015/09/17-18:08:54}) = 0
fcntl(18, F_SETFD, FD_CLOEXEC)          = 0
read(18, "26117 (kvm) S 1 26116 26116 0 -1"..., 8192) = 381
close(18)                               = 0
stat("/var/run/qemu-server/99999.qmp", {st_dev=makedev(0, 16), st_ino=89630957, st_mode=S_IFSOCK|0750, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2015/09/17-18:08:54, st_mtime=2015/09/17-18:08:54, st_ctime=2015/09/17-18:08:54}) = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 18
ioctl(18, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffeb415fb20) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(18, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
ioctl(18, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffeb415fb20) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(18, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
fcntl(18, F_SETFD, FD_CLOEXEC)          = 0
fcntl(18, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
connect(18, {sa_family=AF_FILE, path="/var/run/qemu-server/99999.qmp"}, 110) = 0
fcntl(18, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(18, F_SETFL, O_RDWR)              = 0
fcntl(18, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
getsockopt(18, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
select(24, [18], [18], NULL, {599, 999867}) = 2 (in [18], out [18], left {599, 999865})
read(18, "{\"QMP\": {\"version\": {\"qemu\": {\"m"..., 8192) = 105
write(18, "{\"execute\":\"qmp_capabilities\",\"i"..., 63) = 63
select(24, [18], [], NULL, {599, 999555}) = 1 (in [18], left {599, 999060})
read(18, "{\"return\": {}, \"id\": \"26074:2866"..., 8192) = 36
select(24, [18], [18], NULL, {599, 998853}) = 1 (out [18], left {599, 998850})
write(18, "{\"execute\":\"query-backup\",\"id\":\""..., 59) = 59
select(24, [18], [], NULL, {599, 998716}) = 1 (in [18], left {599, 998257})
read(18, "{\"return\": {\"total\": 14602888806"..., 8192) = 233
close(18)                               = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER, 0x7f200fed60a0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffeb415fd80)       = 0

Checked for open files;
Code:
root@node02:/mnt/pve/SANSSD/images/99999# ls -lash
 14G -rw-r--r-- 1 root root  35G Sep 16 21:52 vm-99999-disk-1.qcow2
 68G -rw-r--r-- 1 root root 103G Sep 16 21:52 vm-99999-disk-2.qcow2

root@node02:/mnt/pve/SANSSD/images/99999# lsof vm-99999-disk-1.qcow2 
COMMAND   PID USER   FD   TYPE DEVICE    SIZE/OFF     NODE NAME
kvm     26117 root   14u   REG   0,30 36513054720 19529754 vm-99999-disk-1.qcow2

root@node02:/mnt/pve/SANSSD/images/99999# lsof vm-99999-disk-2.qcow2 
COMMAND   PID USER   FD   TYPE DEVICE     SIZE/OFF     NODE NAME
kvm     26117 root   11u   REG   0,30 109538705408 19529746 vm-99999-disk-2.qcow2

root@node02:/mnt/pve/QNAPNFS/dump# ls -lash
4.0K -rw-r--r-- 1 nobody nogroup   38 Sep 17 18:08 vzdump-qemu-99999-2015_09_17-18_08_53.vma.dat

root@node02:/mnt/pve/QNAPNFS/dump# lsof  vzdump-qemu-99999-2015_09_17-18_08_53.vma.dat
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
task    26074 root   15w   REG   0,24       38 39518215 vzdump-qemu-99999-2015_09_17-18_08_53.vma.dat
lzop    26125 root    1w   REG   0,24       38 39518215 vzdump-qemu-99999-2015_09_17-18_08_53.vma.dat

root@node02:/mnt/pve/QNAPNFS/dump# lsof /var/run/qemu-server/99999.pid
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
kvm     26117 root    8uW  REG   0,16        6 89630958 /run/qemu-server/99999.pid

root@node02:/mnt/pve/QNAPNFS/dump# lsof /var/run/qemu-server/99999.qmp
COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
kvm     26117 root    7u  unix 0xffff881be49bc740      0t0 89630956 /var/run/qemu-server/99999.qmp


so i checked these pids (26117 and 26125). LZOP (26125) was just waiting for data at "read(0, "

But the KVM process loops continuously.
Gets "Resource temporarily unavailable" error then waits, reads then errors again...
Code:
root@node02:/mnt/pve/QNAPNFS/dump# time strace -c -qvvfp 26117 
^C% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.56    0.023371           5      5113           ppoll
 14.59    0.004832           1      5796           write
  7.15    0.002367           1      2821           recvmsg
  7.09    0.002347           1      2882        67 read
  0.16    0.000053           3        16           close
  0.15    0.000049           1        48           fcntl
  0.14    0.000045           3        16           accept4
  0.05    0.000018           1        16           fstat
  0.05    0.000017           1        16           getsockname
  0.04    0.000014           1        16           getpeername
  0.02    0.000007           1         7           ioctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.033120                 16747        67 total

real    0m9.159s
user    0m0.041s
sys     0m0.226s

root@node02:/mnt/pve/QNAPNFS/dump# strace -e read,ppoll,write -qvvfp 26117
[pid 26117] ppoll([{fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=6, events=POLLIN}, {fd=15, events=POLLIN}, {fd=12, events=POLLIN}, {fd=5, events=POLLIN}, {fd=22, events=POLLIN}], 7, {0, 4000000}, NULL, 8) = 2 ([{fd=6, revents=POLLIN}, {fd=22, revents=POLLIN}], left {0, 3998677})
[pid 26117] read(6, "\2\0\0\0\0\0\0\0", 16) = 8
[pid 26117] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 26117] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 26117] ppoll([{fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=6, events=POLLIN}, {fd=15, events=POLLIN}, {fd=12, events=POLLIN}, {fd=5, events=POLLIN}, {fd=22, events=POLLIN}], 7, {0, 4000000}, NULL, 8) = 2 ([{fd=6, revents=POLLIN}, {fd=22, revents=POLLIN}], left {0, 3998648})
[pid 26117] read(6, "\2\0\0\0\0\0\0\0", 16) = 8
[pid 26117] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 26117] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 26117] read(6, 0x7ffc04ea4780, 16) = -1 EAGAIN (Resource temporarily unavailable)
[pid 26117] ppoll([{fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=6, events=POLLIN}, {fd=15, events=POLLIN}, {fd=12, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}], 7, {0, 4000000}, NULL, 8) = 0 (Timeout)
[pid 26117] ppoll([{fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=6, events=POLLIN}, {fd=15, events=POLLIN}, {fd=12, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}], 7, {0, 4000000}, NULL, 8) = 0 (Timeout)
[pid 26117] ppoll([{fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=6, events=POLLIN}, {fd=15, events=POLLIN}, {fd=12, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}], 7, {0, 4000000}, NULL, 8) = 0 (Timeout)
[pid 26117] ppoll([{fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=6, events=POLLIN}, {fd=15, events=POLLIN}, {fd=12, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}], 7, {0, 4000000}, NULL, 8) = 0 (Timeout)

The limits have effects on system calls for backup process as far as I understood.

For comparison; KVM prosesses trace while backup (without any disk throttle);
Code:
root@node02:/mnt/pve/QNAPNFS/dump# time strace -c -qvvfp 15174^C% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 59.85    0.096854           1     77134           write
 13.14    0.021265           2     11608           ppoll
  9.76    0.015795           1     16944       862 read
  9.65    0.015622           4      3597           io_submit
  5.00    0.008095           2      3597           io_getevents
  2.44    0.003948           1      3801           recvmsg
  0.06    0.000091           5        20           close
  0.03    0.000051           1        60           fcntl
  0.02    0.000037           2        20           accept4
  0.01    0.000020           1        20           getsockname
  0.01    0.000019           2        11           ioctl
  0.01    0.000017           1        20           fstat
  0.01    0.000014           1        20           getpeername
------ ----------- ----------- --------- --------- ----------------
100.00    0.161828                116852       862 total


real    0m9.945s
user    0m0.367s
sys     0m1.491s
 
Hi
Thanks for the time you took on this issue. I can reproduce with Proxmox VE 3.4, but everything is fine with 4.0Beta2 (with or without VirtIO devices)
Can you maybe upgrade your host ?
 
Hi
Thanks for the time you took on this issue. I can reproduce with Proxmox VE 3.4, but everything is fine with 4.0Beta2 (with or without VirtIO devices)
Can you maybe upgrade your host ?

Hello, thanks for your reply.

It's a 3 nodes cluster. There is not anything too important on it but i dont want to risk it.
I think its better to wait for release, thanks again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!