Nightly backup hang (by IO) whole system.

niksite

New Member
Jun 25, 2013
3
0
1
Hua Hin, Thailand
I'm using proxmox starting Jan and everything was OK until Jun (probably, until 3.0 version of the proxmox).I've been notified about site unavailability and after short check figured out that I have really huge load average on the host system, almost zero CPU usage and nightly backup process working for about 13 hours and counting.I have tried 'Restart' button on proxmox. No success. I have tried reboot command in shell. No success. So, the host system was cold restarted.Today it was happen again:
Code:
 13:34:37 up 14 days, 13:46,  2 users,  load average: 1180.18, 1170.49, 1146.04
Start of backup time is 0:0, I have cold restarted the host at about 13:38. And it was just first (and smallest) backup in the row:
Code:
INFO: starting new backup job: vzdump --quiet 1 --mode snapshot --compress gzip --storage backup_daily --all 1INFO: Starting Backup of VM 100 (openvz)INFO: CTID 100 exist mounted runningINFO: status = runningINFO: backup mode: snapshotINFO: ionice priority: 7INFO: creating lvm snapshot of /dev/mapper/vg0-vz ('/dev/vg0/vzsnap-vz-0')INFO:   Logical volume "vzsnap-vz-0" createdINFO: creating archive '/backup/vz/daily/dump/vzdump-openvz-100-2013_06_25-00_00_01.tar.gz'INFO: Total bytes written: 731238400 (698MiB, 3.3MiB/s)INFO: archive file size: 325MBINFO: delete old backup '/backup/vz/daily/dump/vzdump-openvz-100-2013_06_22-00_00_02.tar.gz'
 
Hi, we have the same issuelook here:http://forum.proxmox.com/threads/14285-lvremove-error-on-a-backup-and-hight-IO-Delays-load-averagedo you have any ideas to solve this issue?regards
Yes, your issue looks pretty the same. One difference, I have no 'INFO: task lvremove:375696 blocked for more than 120 seconds', but 'INFO: task redis-server:460631 blocked for more than 120 seconds.' and similar instead. I have no solution yet. Just has disabled daily backups.
 
check your LVM snapshot size, any custom settings in vzdump.conf? also provide all info about your filesystem and custom partitioning, lvm.

and finally, post your 'pveversion -v'
 
check your LVM snapshot size, any custom settings in vzdump.conf? also provide all info about your filesystem and custom partitioning, lvm.and finally, post your 'pveversion -v'
I have just "size: 16000" in /etc/vzdump.conf

Code:
# vgs
  VG   #PV #LV #SN Attr   VSize VFree  
  vg0    1   4   0 wz--n- 2.73t 116.02g
# lvs
  LV     VG   Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  backup vg0  -wi-ao-- 500.00g                                           
  root   vg0  -wi-ao-- 100.00g                                           
  swap   vg0  -wi-ao--  30.00g                                           
  vz     vg0  -wi-ao--   2.00t        
# pveversion -v
pve-manager: 3.0-20 (pve-manager/3.0/0428106c)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-15
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-6
vncterm: 1.1-3
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-12
ksm-control-daemon: 1.1-1

I don't know why forum remove all newline symbols from my messages, so I have posted the code above to the gist as well: https://gist.github.com/niksite/5865674
 
Last edited by a moderator:
Hi tom,

vgs
Code:
  VG   #PV #LV #SN Attr   VSize VFree
  pve    1   4   0 wz--n- 6,37t 8,00g

lvs
Code:
  LV               VG   Attr     LSize  Pool Origin Data%  Move Log Copy%  Convert
  data             pve  -wi-ao--  6,21t                                         
  root             pve  -wi-ao-- 96,00g                                         
  swap             pve  -wi-ao-- 47,00g                                         
  vzsnap-pegasus-0 pve  -wi-a---  8,00g

pveversion -v
Code:
proxmox-ve-2.6.32: 3.0-104 (running kernel: 2.6.32-21-pve)
pve-manager: 3.0-30 (running version: 3.0-30/3d1ccfa6)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-21-pve: 2.6.32-104
pve-kernel-2.6.32-17-pve: 2.6.32-83
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-23
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-9
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1

/etc/vzdump.conf
Code:
# vzdump default settings

#tmpdir: DIR
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#size: MB
#maxfiles: N
#script: FILENAME
#exclude-path: PATHLIST

bwlimit: 0
size: 8192

df -h
Code:
Dateisystem                  Größe Benutzt Verf. Verw% Eingehängt auf
udev                           10M       0   10M    0% /dev
tmpfs                         5,6G    364K  5,6G    1% /run
/dev/mapper/pve-root           95G    3,2G   87G    4% /
tmpfs                         5,0M       0  5,0M    0% /run/lock
tmpfs                          12G     38M   11G    1% /run/shm
/dev/mapper/pve-data          6,2T    519G  5,7T    9% /var/lib/vz
/dev/cciss/c0d0p2             494M    144M  326M   31% /boot
/dev/fuse                      30M     80K   30M    1% /etc/pve
10.11.12.50:/volume1/storage   14T    1,6T   12T   12% /mnt/pve/SLS-001
/var/lib/vz/private/5133      250G     14G  237G    6% /var/lib/vz/root/5133
tmpfs                         4,0G       0  4,0G    0% /var/lib/vz/root/5133/lib/init/rw
tmpfs                         4,0G       0  4,0G    0% /var/lib/vz/root/5133/dev/shm
/var/lib/vz/private/5132      500G    193G  308G   39% /var/lib/vz/root/5132
tmpfs                         4,0G       0  4,0G    0% /var/lib/vz/root/5132/lib/init/rw
tmpfs                         4,0G       0  4,0G    0% /var/lib/vz/root/5132/dev/shm
/var/lib/vz/private/5130     1000G     37G  964G    4% /var/lib/vz/root/5130
tmpfs                         4,0G       0  4,0G    0% /var/lib/vz/root/5130/lib/init/rw
tmpfs                         4,0G       0  4,0G    0% /var/lib/vz/root/5130/dev/shm
/var/lib/vz/private/5131      2,0T    277G  1,8T   14% /var/lib/vz/root/5131
tmpfs                         8,0G       0  8,0G    0% /var/lib/vz/root/5131/lib/init/rw
tmpfs                         8,0G       0  8,0G    0% /var/lib/vz/root/5131/dev/shm

fdisk -l
Code:
WARNING: GPT (GUID Partition Table) detected on '/dev/cciss/c0d0'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/cciss/c0d0: 7001.2 GB, 7001197993984 bytes
255 heads, 63 sectors/track, 851180 cylinders, total 13674214832 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1               1  4294967295  2147483647+  ee  GPT

Disk /dev/mapper/pve-root: 103.1 GB, 103079215104 bytes
255 heads, 63 sectors/track, 12532 cylinders, total 201326592 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/pve-root doesn't contain a valid partition table

Disk /dev/mapper/pve-swap: 50.5 GB, 50465865728 bytes
255 heads, 63 sectors/track, 6135 cylinders, total 98566144 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/pve-swap doesn't contain a valid partition table

Disk /dev/mapper/pve-data: 6829.9 GB, 6829937524736 bytes
255 heads, 63 sectors/track, 830359 cylinders, total 13339721728 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/pve-data doesn't contain a valid partition table

Disk /dev/mapper/pve-vzsnap--pegasus--0: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/pve-vzsnap--pegasus--0 doesn't contain a valid partition table

/etc/fstab
Code:
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/root / ext3 errors=remount-ro 0 1
/dev/pve/data /var/lib/vz ext3 defaults 0 1
UUID=5a942cfb-0e70-4ee0-b23a-7bf7a07b0d20 /boot ext3 defaults 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0

do you need more informations?
thanks for your help.

regards.
 
Last edited:
Hi, +1 for this problem !

I have just "size: 4000" in /etc/vzdump.conf

VG #PV #LV #SN Attr VSize VFree
pve 1 2 0 wz--n- 1.80t 100.00g

LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
data pve -wi-ao-- 1.70t
vzsnap-nsXXXXXX-0 pve -wi-a--- 1.00g


Code:
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1

Code:
fdisk -l

/dev/sda1	/	ext3	errors=remount-ro	0	1
/dev/pve/data	/var/lib/vz	ext3	defaults	0	2
/dev/sda3	none	swap	defaults	0	0
proc            /proc   proc    defaults        0       0
sysfs           /sys    sysfs   defaults        0       0
 
@gkovacs

use last test version of proxmox, reboot your server. we have test the last driver on 13 servers with this same issue. after update to the last proxmox testversion it´s works fine.

regards
 
@gkovacs

use last test version of proxmox, reboot your server. we have test the last driver on 13 servers with this same issue. after update to the last proxmox testversion it´s works fine.

regards

Why would I do that?
As I wrote above, we do not have the issue anymore, since we have updated the Adaptec controller firmare to the latest version.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!