Super slow Backup with pve 2.x

rengiared · Jun 22, 2012

Hi,

since i upgraded from PVE 1.9 to PVE 2.x i have on 2 of my 3 systems big problems with the backup-speed.
All 3 are in a cluster-configuration but only for simple administration. Every VM is on the local storage and so on.

This are the three systems:

PVE01: Dell R710, 2x Xeon E5520, 24GB RAM, 8x 146GB 10k SAS RAID5
PVE02: Dell R715, 2x Opteron 6128, 64GB RAM, 4x 300GB 10k SAS RAID5
PVE03: Dell R310, Xeon X3440, 16GB RAM, 2x 250GB SATA RAID1

NAS01: Dell T110, i3 540, 12GB RAM, 5x 3TB in RaidZ2 - FreeNas 8.0.4 p3

pveversion -v is on all 3 the same like this:

Code:

pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.1-68
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-12-pve: 2.6.32-68
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-16
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

this are the pveperf results (in usage, not in idle)

PVE01:

Code:

CPU BOGOMIPS:      72340.90
REGEX/SECOND:      417623
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    316.54 MB/sec
AVERAGE SEEK TIME: 5.32 ms
FSYNCS/SECOND:     2225.81
DNS EXT:           104.89 ms
DNS INT:           1.65 ms

PVE02:

Code:

CPU BOGOMIPS:      64000.67
REGEX/SECOND:      784386
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    382.58 MB/sec
AVERAGE SEEK TIME: 5.38 ms
FSYNCS/SECOND:     1852.52
DNS EXT:           91.52 ms
DNS INT:           1.12 ms

PVE03:

Code:

CPU BOGOMIPS:      40425.78
REGEX/SECOND:      842447
HD SIZE:           57.09 GB (/dev/mapper/pve-root)
BUFFERED READS:    98.44 MB/sec
AVERAGE SEEK TIME: 9.74 ms
FSYNCS/SECOND:     768.98
DNS EXT:           87.18 ms
DNS INT:           1.13 ms

enough of number crunching, I have the problem on PVE01 and PVE02 that there is the backup-speed between 0 to 3mb/s.
with 1.9 this was about 50 mb/s and more (i always backup with snapshot mode and no compression)

when backuping i get these errors in the systemlog:

Code:

Jun 14 19:16:04 pve02 pvestatd[2231]: WARNING: command 'df -P -B 1 /mnt/pve/PVE' failed: got timeout
Jun 14 19:16:04 pve02 pvestatd[2231]: status update time (5.589 seconds)
Jun 14 19:16:26 pve02 pvestatd[2231]: WARNING: command 'df -P -B 1 /mnt/pve/PVE' failed: got timeout
Jun 14 19:16:26 pve02 pvestatd[2231]: status update time (6.669 seconds)
Jun 14 19:16:31 pve02 pvestatd[2231]: WARNING: command 'df -P -B 1 /mnt/pve/PVE' failed: got timeout

at the beginning i thougt this would be a problem from the upgrade from 1.9 to 2.x, so i installed pve01 und pve02 completely new from scratch in 2 nightshifts but the situation stays the same.
on PVE03 i still get backup-speeds of 30mb/s (was the same on 1.9)

the really strange is that the first 8 to 12 GiB are done most of the time in normal speed and without or very less errors in syslog and then the logs run over with the timeout errors.

as you have written in another thread that this is a problem with the nas that it hasn't enough power i have set up an extra PVE 1.9 server with an old machine

Dell Poweredge 2950
Xeon 5130
4GB RAM
4x 146GB 15k SAS

pveperf:

Code:

CPU BOGOMIPS:      7980.88
REGEX/SECOND:      664907
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    238.40 MB/sec
AVERAGE SEEK TIME: 4.30 ms
FSYNCS/SECOND:     2350.09
DNS EXT:           86.76 ms
DNS INT:           1.11 ms

Only one machine running with 3GB assigned and the backup had a speed with 33.96 MiB/s
i tried to limit the bandwith on the problematic servers to 25000kb/s but the problem appears in the same way, so i can't believe that the problem cause is the nas
with SMB i get 100mb per second (eth-port is limiting) on it and as far as i know NFS needs much less cpu power

Please help me, at the moment i'm desperate cause i can't backup any machine
i would be really thankful for any advice what i can do/try

tom · Jun 22, 2012

also provide the backup logs.

is your lvm snapshot full? monitor with lvdisplay.

and show the output of 'pvs'

rengiared · Jun 22, 2012

this was a backup on pve02 with PVE1.9

Code:

May 25 23:02:42 INFO: Starting Backup of VM 203 (qemu)
May 25 23:02:42 INFO: running
May 25 23:02:42 INFO: status = running
May 25 23:02:42 INFO: backup mode: snapshot
May 25 23:02:42 INFO: ionice priority: 7
May 25 23:02:43 INFO:   Logical volume "vzsnap-pve02-0" created
May 25 23:02:43 INFO: creating archive '/mnt/pve/pve02/vzdump-qemu-203-2012_05_25-23_02_42.tar'
May 25 23:02:43 INFO: adding '/mnt/pve/pve02/vzdump-qemu-203-2012_05_25-23_02_42.tmp/qemu-server.conf' to archive ('qemu-server.conf')
May 25 23:02:43 INFO: adding '/mnt/vzsnap0/images/203/vm-203-disk-1.qcow2' to archive ('vm-disk-ide0.qcow2')
May 25 23:25:34 INFO: Total bytes written: 80941476864 (56.30 MiB/s)
May 25 23:25:34 INFO: archive file size: 75.38GB
May 25 23:25:34 INFO: delete old backup '/mnt/pve/pve02/vzdump-qemu-203-2012_05_15-22_23_02.tar'
May 25 23:25:40 INFO:   Logical volume "vzsnap-pve02-0" successfully removed
May 25 23:25:40 INFO: Finished Backup of VM 203 (00:22:58)

actually if a backup is successfull on pve01 or pve02 it looks like this

Code:

Jun 09 15:23:21 INFO: Starting Backup of VM 204 (qemu)
Jun 09 15:23:21 INFO: status = running
Jun 09 15:23:21 INFO: backup mode: snapshot
Jun 09 15:23:21 INFO: ionice priority: 7
Jun 09 15:23:21 INFO:   Logical volume "vzsnap-pve02-0" created
Jun 09 15:23:21 INFO: creating archive '/mnt/pve/vm/dump/vzdump-qemu-204-2012_06_09-15_23_21.tar'
Jun 09 15:23:21 INFO: adding '/mnt/pve/vm/dump/vzdump-qemu-204-2012_06_09-15_23_21.tmp/qemu-server.conf' to archive ('qemu-server.conf')
Jun 09 15:23:21 INFO: adding '/mnt/vzsnap0/images/204/vm-204-disk-1.qcow2' to archive ('vm-disk-ide0.qcow2')
Jun 09 17:13:40 INFO: Total bytes written: 9166867456 (1.32 MiB/s)
Jun 09 17:13:40 INFO: archive file size: 8.54GB
Jun 09 17:13:42 INFO: Finished Backup of VM 204 (01:50:21)

the backup on pve03 (the working one)

Code:

Jun 09 09:56:14 INFO: Starting Backup of VM 304 (qemu)
Jun 09 09:56:14 INFO: status = running
Jun 09 09:56:14 INFO: backup mode: snapshot
Jun 09 09:56:14 INFO: ionice priority: 7
Jun 09 09:56:14 INFO:   Logical volume "vzsnap-pve03-0" created
Jun 09 09:56:15 INFO: creating archive '/mnt/pve/pve/dump/vzdump-qemu-304-2012_06_09-09_56_14.tar'
Jun 09 09:56:15 INFO: adding '/mnt/pve/pve/dump/vzdump-qemu-304-2012_06_09-09_56_14.tmp/qemu-server.conf' to archive ('qemu-server.conf')
Jun 09 09:56:15 INFO: adding '/mnt/vzsnap0/images/304/vm-304-disk-1.raw' to archive ('vm-disk-ide0.raw')
Jun 09 10:04:32 INFO: Total bytes written: 16195860480 (31.08 MiB/s)
Jun 09 10:04:32 INFO: archive file size: 15.08GB
Jun 09 10:04:33 INFO: Finished Backup of VM 304 (00:08:19)

i have tried a backup-storage for all nodes and one for each (but its always the same nfs share on the nas, but i tried a different nas-nfs-share as well)

pvs results

Code:

pve01
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  pve  lvm2 a--  952.37g 16.00g


pve02
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  pve  lvm2 a--  836.12g 16.00g

i tried to vary the size in vzdump.conf from 1000 to 2000 and even 4000
nothing really changed

i'm going to submit lvdisplay results later that day (evening)

thanks!

rengiared · Jun 22, 2012

i have tested it now with size of 2000, 4000, 8000 and 16000
i will post the results from the first run (4000) cause all others are quite the same

in the first ~2 minutes about 8gb are written and lvdisplay looks like this

Code:

--- Logical volume ---
  LV Path                /dev/pve/vzsnap-pve02-0
  LV Name                vzsnap-pve02-0
  VG Name                pve
  LV UUID                1yd5uE-8peY-tjG6-qbCl-stYw-ab82-00IioZ
  LV Write Access        read/write
  LV Creation host, time pve02, 2012-06-22 20:44:09 +0200
  LV snapshot status     active destination for data
  LV Status              available
  # open                 1
  LV Size                662.12 GiB
  Current LE             169504
  COW-table size         3.91 GiB
  COW-table LE           1000
  Allocated to snapshot  0.33%
  Snapshot chunk size    4.00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

after the errors appear in the syslog and the backup starts to sloooow down its like this (not a real change what i see)

Code:

  --- Logical volume ---
  LV Path                /dev/pve/vzsnap-pve02-0
  LV Name                vzsnap-pve02-0
  VG Name                pve
  LV UUID                1yd5uE-8peY-tjG6-qbCl-stYw-ab82-00IioZ
  LV Write Access        read/write
  LV Creation host, time pve02, 2012-06-22 20:44:09 +0200
  LV snapshot status     active destination for data
  LV Status              available
  # open                 1
  LV Size                662.12 GiB
  Current LE             169504
  COW-table size         3.91 GiB
  COW-table LE           1000
  Allocated to snapshot  0.44%
  Snapshot chunk size    4.00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

this is what the syslog looks like

Code:

Jun 22 20:44:08 pve02 pvedaemon[939450]: <root@pam> starting task UPID:pve02:000CF9DD:0728BDBB:4FE4BCF8:vzdump::root@pam:
Jun 22 20:44:08 pve02 pvedaemon[850397]: INFO: starting new backup job: vzdump 201 --remove 0 --mode snapshot --storage pve02 --node pve02
Jun 22 20:44:08 pve02 pvedaemon[850397]: INFO: Starting Backup of VM 201 (qemu)
Jun 22 20:44:08 pve02 qm[850401]: <root@pam> update VM 201: -lock backup
Jun 22 20:44:09 pve02 kernel: EXT3-fs: barriers disabled
Jun 22 20:44:09 pve02 kernel: kjournald starting. Commit interval 5 seconds
Jun 22 20:44:09 pve02 kernel: EXT3-fs (dm-3): using internal journal
Jun 22 20:44:09 pve02 kernel: EXT3-fs (dm-3): mounted filesystem with ordered data mode
Jun 22 20:45:10 pve02 pvedaemon[2291]: worker 924805 finished
Jun 22 20:45:10 pve02 pvedaemon[2291]: starting 1 worker(s)
Jun 22 20:45:10 pve02 pvedaemon[2291]: worker 850726 started
Jun 22 20:46:30 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
Jun 22 20:46:43 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
Jun 22 20:46:43 pve02 pvestatd[1952]: status update time (5.307 seconds)
Jun 22 20:47:03 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
Jun 22 20:47:41 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
Jun 22 20:47:42 pve02 pvedaemon[2291]: worker 927546 finished
Jun 22 20:47:42 pve02 pvedaemon[2291]: starting 1 worker(s)
Jun 22 20:47:42 pve02 pvedaemon[2291]: worker 851192 started
Jun 22 20:47:51 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
Jun 22 20:48:12 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
.
.
kill PID
deleting the dat file on the nas share improves the termination of the process, if not deleted the termination can take much more time
.
.
Jun 22 20:56:10 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
Jun 22 20:56:31 pve02 pvestatd[1952]: WARNING: command 'df -P -B 1 /mnt/pve/pve02' failed: got timeout
Jun 22 20:58:14 pve02 pvedaemon[850397]: ERROR: Backup of VM 201 failed - command '/usr/lib/qemu-server/vmtar -s '/mnt/pve/pve02/dump/vzdump-qemu-201-2012_06_22-20_44_08.tmp/qemu-server.conf' 'qemu-server.conf' '/mnt/vzsnap0/images/201/vm-201-disk-1.raw' 'vm-disk-ide0.raw' >/mnt/pve/pve02/dump/vzdump-qemu-201-2012_06_22-20_44_08.dat' failed: exit code 255
Jun 22 20:58:14 pve02 pvedaemon[850397]: INFO: Backup job finished with errors
Jun 22 20:58:14 pve02 pvedaemon[850397]: job errors

thats the backup log, at the first try i accidentialy made it with suspend mode, but that made no difference

Code:

Jun 11 22:01:31 INFO: Starting Backup of VM 201 (qemu)
Jun 11 22:01:31 INFO: status = running
Jun 11 22:01:31 INFO: backup mode: suspend
Jun 11 22:01:31 INFO: ionice priority: 7
Jun 11 22:01:31 INFO: suspend vm
Jun 11 22:01:32 INFO: creating archive '/mnt/pve/pve02/dump/vzdump-qemu-201-2012_06_11-22_01_31.tar'
Jun 11 22:01:32 INFO: adding '/mnt/pve/pve02/dump/vzdump-qemu-201-2012_06_11-22_01_31.tmp/qemu-server.conf' to archive ('qemu-server.conf')
Jun 11 22:01:32 INFO: adding '/var/lib/vz/images/201/vm-201-disk-1.raw' to archive ('vm-disk-ide0.raw')
Jun 11 22:23:28 INFO: Killed
Jun 11 22:23:30 INFO: resume vm
Jun 11 22:23:31 INFO: vm is online again after 1320 seconds
Jun 11 22:23:31 ERROR: Backup of VM 201 failed - command '/usr/lib/qemu-server/vmtar -s '/mnt/pve/pve02/dump/vzdump-qemu-201-2012_06_11-22_01_31.tmp/qemu-server.conf' 'qemu-server.conf' '/var/lib/vz/images/201/vm-201-disk-1.raw' 'vm-disk-ide0.raw' >/mnt/pve/pve02/dump/vzdump-qemu-201-2012_06_11-22_01_31.dat' failed: exit code 137

and here is a screenshot of the utilization of the nas in the time of my tests, first peek is with size 4000, second one with 2000, third with 8000

snowman66 · Jun 23, 2012

Any hint on Freenas side? Maybe is problem with Freenas.

ewuewu · Jun 23, 2012

I have the same issue. I've setup a backup for 4 VM's on a standalone proxmox host. The nfs backup target is a ubuntu 12.04 linux box. For this backup test there were no concurrent accesses to this backup box.

I have running also a second proxmox test environment (2 hosts running as cluster with DBBD). If I start backups from this cluster to the nfs box I have absolute acceptable access rates. The nfs settings and /etc/vzdump.conf settings are the same on both proxmox setups.

There are no errs in syslog.

The backup from the slow box took over 13 hours. The same backup on the other proxmox boxes ends after prx. 1,5 hours.

Here are the results and confs from my slow box.

Output of the backup job:

Code:

INFO: starting new backup job: vzdump 110 111 112 113 --quiet 1 --mailto
postmaster@netz-objekte.de --mode snapshot --compress lzo --storage
ubuntu-nfs1-v4
INFO: Starting Backup of VM 110 (qemu)
INFO: status =
running
INFO: backup mode: snapshot
INFO: bandwidth limit: 10000
KB/s
INFO: ionice priority: 7
INFO: Logical volume "vzsnap-proxmox3-0"
created
INFO: creating archive
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-110-2012_06_22-18_15_01.tar.lzo'
INFO:
adding
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-110-2012_06_22-18_15_01.tmp/qemu-server.conf'
to archive ('qemu-server.conf')
INFO: adding
'/mnt/vzsnap0/images/110/vm-110-disk-1.raw' to archive
('vm-disk-ide0.raw')
INFO: Total bytes written: 161061276160 (9.77
MiB/s)
INFO: archive file size: 2.60GB
INFO: delete old backup
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-110-2012_06_21-10_55_02.tar.lzo'
INFO:
Finished Backup of VM 110 (04:22:20)
INFO: Starting Backup of VM 111
(qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO:
bandwidth limit: 10000 KB/s
INFO: ionice priority: 7
INFO: Logical volume
"vzsnap-proxmox3-0" created
INFO: creating archive
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-111-2012_06_22-22_37_21.tar.lzo'
INFO:
adding
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-111-2012_06_22-22_37_21.tmp/qemu-server.conf'
to archive ('qemu-server.conf')
INFO: adding
'/mnt/vzsnap0/images/111/vm-111-disk-1.raw' to archive
('vm-disk-ide0.raw')
INFO: adding '/mnt/vzsnap0/images/111/vm-111-disk-2.raw'
to archive ('vm-disk-virtio0.raw')
INFO: Total bytes written: 214756756480
(9.77 MiB/s)
INFO: archive file size: 69.00GB
INFO: Finished Backup of VM
111 (05:49:36)
INFO: Starting Backup of VM 112 (qemu)
INFO: status =
running
INFO: backup mode: snapshot
INFO: bandwidth limit: 10000
KB/s
INFO: ionice priority: 7
INFO: Logical volume "vzsnap-proxmox3-0"
created
INFO: creating archive
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-112-2012_06_23-04_26_57.tar.lzo'
INFO:
adding
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-112-2012_06_23-04_26_57.tmp/qemu-server.conf'
to archive ('qemu-server.conf')
INFO: adding
'/mnt/vzsnap0/images/112/vm-112-disk-1.raw' to archive
('vm-disk-ide0.raw')
INFO: Total bytes written: 10737420800 (9.77
MiB/s)
INFO: archive file size: 6.18GB
INFO: delete old backup
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-112-2012_06_21-15_17_14.tar.lzo'
INFO:
Finished Backup of VM 112 (00:18:12)
INFO: Starting Backup of VM 113
(qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO:
bandwidth limit: 10000 KB/s
INFO: ionice priority: 7
INFO: Logical volume
"vzsnap-proxmox3-0" created
INFO: creating archive
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-113-2012_06_23-04_45_09.tar.lzo'
INFO:
adding
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-113-2012_06_23-04_45_09.tmp/qemu-server.conf'
to archive ('qemu-server.conf')
INFO: adding
'/mnt/vzsnap0/images/113/vm-113-disk-1.raw' to archive
('vm-disk-virtio0.raw')
INFO: Total bytes written: 107374184960 (9.77
MiB/s)
INFO: archive file size: 2.85GB
INFO: delete old backup
'/mnt/pve/ubuntu-nfs1-v4/dump/vzdump-qemu-113-2012_06_22-11_21_21.tar.lzo'
INFO:
Finished Backup of VM 113 (02:54:50)
INFO: Backup job finished
successfully
TASK OK

vzdump.conf:

Code:

# vzdump default settings
#tmpdir: DIR
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
bwlimit: 10000
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#size: MB
size: 2048
#maxfiles: N
#script: FILENAME
#exclude-path: PATHLIST

Output from /proc/ mounts:

Code:

XXX.XXX.XXX.109:/backup-ext3-sdc/proxmox /mnt/pve/ubuntu-nfs1-v4 nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=XXX.XXX.XXX.109,mountvers=3,mountport=49913,mountproto=udp,local_lock=none,addr=XXX.XXX.XXX.109 0 0

output from pveversion

Code:

pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-12-pve
proxmox-ve-2.6.32: 2.1-68
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-12-pve: 2.6.32-68
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-16
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

output from nfsstat:

Code:

calls retrans authrefrsh
18226013 0 18226022
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 10007 0% 4 0% 32 0% 2139 0% 0 0%
read write create mkdir symlink mknod
1 0% 18203574 99% 12 0% 4 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
10 0% 4 0% 4 0% 0 0% 0 0% 26 0%
fsstat fsinfo pathconf commit
6652 0% 2 0% 1 0% 3541 0%

Code:

output from pveperf:
CPU BOGOMIPS: 20092.55
REGEX/SECOND: 1010863
HD SIZE: 94.49 GB (/dev/mapper/pve-root)
BUFFERED READS: 67.59 MB/sec
AVERAGE SEEK TIME: 18.76 ms
FSYNCS/SECOND: 2410.88
DNS EXT: 60.19 ms
DNS INT: 17.01 ms (netz-objekte.de)

dietmar · Jun 24, 2012

ewuewu said:
The backup from the slow box took over 13 hours.

You get a rate of 9 MiB/s. Maybe you use an 100MBit NIC?

ewuewu · Jun 24, 2012

Hello Dietmar,

it's definitly a 1GB Nic. But for sure I've teste it with dd. Pls. see below:

Code:

# time dd if=/dev/zero of=/mnt/pve/ubuntu-nfs1-v4/bigfile.txt bs=1M count=2000
2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 41.8097 s, 50.2 MB/sreal    0m42.643s
user    0m0.002s
sys     0m2.419s

ewuewu · Jun 24, 2012

I've made the tests with all three backup modes (suspend, stop and snapshot). All modes are showing the same slow behavior from this box.
There are no err-messages in syslog and there were always no concurrent access to the nfs-server.
Backups from other systems (Proxmox and not Proxmox) were running as expected with reasonable speed.
I’ve also started a test with another nfs-server with same results.

The hardware of this box is:
Proc: AMD Phenom(tm) 9850 Quad-Core Processor
Storage: 3Ware 9650SE SATA-II RAID
Nic: 2x nVidia MCP55 Ethernet onboard nic
Board: ASUS M2N-SLI DELUXE ACPI

This box was running with ubuntu 10.04 LTS before without any issues.

dietmar · Jun 24, 2012

What speed (dd) to you get if you test with a block size of 32K or smaller?

ewuewu · Jun 24, 2012

Hello Dietmar,

dietmar said:
What speed (dd) to you get if you test with a block size of 32K or smaller?

I've did it with 32k and with 8k. Here are my results:

Code:

# time dd if=/dev/zero of=/mnt/pve/ubuntu-nfs1-v4/bigfile.txt bs=32k count=62500
62500+0 records in
62500+0 records out
2048000000 bytes (2.0 GB) copied, 40.9397 s, 50.0 MB/s
real    0m44.140s
user    0m0.003s
sys     0m2.506s

# time dd if=/dev/zero of=/mnt/pve/ubuntu-nfs1-v4/bigfile.txt bs=8k count=250000
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB) copied, 46.4243 s, 44.1 MB/s
real    0m47.848s
user    0m0.041s
sys     0m2.597s

I think it looks not so bad

rengiared · Jun 24, 2012

snowman66 said:
Any hint on Freenas side? Maybe is problem with Freenas.

no, unfortunately not, that would be too nice and easy

ewuewu said:
I have the same issue.

"good" to hear that i'm not alone with this problem

this are my dd test results

Code:

root@pve01:~# time dd if=/dev/zero of=/mnt/pve/pve01/bigfile01.txt bs=8k count=256000
256000+0 records in
256000+0 records out
2097152000 bytes (2.1 GB) copied, 22.5715 s, 92.9 MB/s

real    0m23.379s
user    0m0.074s
sys     0m5.551s

root@pve01:~# time dd if=/dev/zero of=/mnt/pve/pve01/bigfile01.txt bs=32k count=64000
64000+0 records in
64000+0 records out
2097152000 bytes (2.1 GB) copied, 20.4195 s, 103 MB/s

real    0m21.250s
user    0m0.014s
sys     0m3.411s

root@pve02:~#  time dd if=/dev/zero of=/mnt/pve/pve02/bigfile02.txt bs=8k count=256000
256000+0 records in
256000+0 records out
2097152000 bytes (2.1 GB) copied, 21.4448 s, 97.8 MB/s

real    0m22.052s
user    0m0.050s
sys     0m4.207s

root@pve02:~#  time dd if=/dev/zero of=/mnt/pve/pve02/bigfile02.txt bs=32k count=64000
64000+0 records in
64000+0 records out
2097152000 bytes (2.1 GB) copied, 20.9813 s, 100 MB/s

real    0m28.522s
user    0m0.014s
sys     0m3.757s

i have testet 1M, 512k, 256k, 128k, 64k, 32k, 16k, 8k, 4k on both machines and every single result was between 90 and 105 MB/s

edit: after some meditation

i had an idea and made the dd-test (32k) with 20GB and not 2GB, and voila, at ~16GB the errors appeared again und the transfer stopped nearly
from 3 trys with 20GB, 2 stopped at ~17GB (on pve02)

so i tried the same on pve03 and there was no problem on 2 trys with 20GB so did the next step and tried 200GB and here i got a few errors in the syslog, but the transfer looked like this

Code:

pve03:~#  time dd if=/dev/zero of=/mnt/pve/pve03/bigfile03.txt bs=32k count=6400000
6400000+0 records in
6400000+0 records out
209715200000 bytes (210 GB) copied, 2214.95 s, 94.7 MB/s

real    36m57.729s
user    0m0.902s
sys     2m23.208s

if the NAS is the evil, i dont get it why it works on pve03 superb and with pve01 and pve02 not

tomorrow when i'm in the company again i'm searching for an old buffalo nas and i'm going to set it up and try backups and dd tests with it

Erk · Jun 24, 2012

Backup stalls for me to a local second hard disk if the resulting tar.dat image gets over 20GB, lots of free space for the backup available. I had this problem on proxmox 1.9 and had to set size to 2048. Even with size set to 3072 in vzdump.conf some VM backups stall after an hour or so, smaller images not a problem, they take under 20min each, so it varies from VM to VM. All the VMs have either 32G,40G, or 80G .raw drives, most are 80G, they compress down a lot, typically well less than 20GB but it varies a lot. I would like to know why the larger ones fail to complete. Nothing really in the syslog. If you kill the vmtar process, the backup seems to complete, but I don't know if the resulting .lzo file is complete.

I have also noted that some backups which fail to complete, work the next time you try, so there is a random factor. The backup script needs more status information as to why it is stuck, so far I mostly have to do a ps and take guesses.

dietmar · Jun 25, 2012

ewuewu said:
I've did it with 32k and with 8k. Here are my results:

Please can you try with larger data size (>20GB). And use 'conv=fdatasync' option for dd.

ewuewu · Jun 25, 2012

Hello Dietmar

dietmar said:
Please can you try with larger data size (>20GB). And use 'conv=fdatasync' option for dd.

here are my results from my 'bad' box:

Code:

time dd if=/dev/zero of=/mnt/pve/ubuntu-nfs1-v4/bigfile.txt bs=16k count=2500000 conv=fdatasync
2500000+0 records in
2500000+0 records out
40960000000 bytes (41 GB) copied, 875.575 s, 46.8 MB/s
real    14m35.977s
user    0m0.405s
sys     0m54.144s

time dd if=/dev/zero of=/mnt/pve/ubuntu-nfs1-v4/bigfile.txt bs=32k count=1250000 conv=fdatasync
1250000+0 records in
1250000+0 records out
40960000000 bytes (41 GB) copied, 851.567 s, 48.1 MB/s
real    14m20.246s
user    0m0.279s
sys     0m45.677s

time dd if=/dev/zero of=/mnt/pve/ubuntu-nfs1-v4/bigfile.txt bs=1M count=41000 conv=fdatasync
41000+0 records in
41000+0 records out
42991616000 bytes (43 GB) copied, 894.705 s, 48.1 MB/s
real    15m2.631s
user    0m0.065s
sys     0m41.178s

I assume that there is not really a nfs issue but I've no idea where to search in detail.

During the last two tests (bs=32k and bs=1M) I saw some messages in syslog (roundabout 10 times):

Code:

Jun 25 08:54:49 proxmox3 pvestatd[1577]: WARNING: command 'df -P -B 1 /mnt/pve/ubuntu-nfs1-v4' failed: got timeout

dietmar · Jun 25, 2012

ewuewu said:
I assume that there is not really a nfs issue but I've no idea where to search in detail.
[/CODE]

Is it fast when you backup to local storage?

RobFantini · Jun 25, 2012

from cli on the proxmox system check for nfs issues from output of ' dmesg ' .

and freenas : have you followed this: http://doc.freenas.org/index.php/NFS ? especially that part regarding ' Number of Servers. ' .

also in freenas check /var/log/messages .

I use freenas at home for backup and some kvm's . Originally I had backup and nfs problems until tuning freenas.

here is output of dmesg from then :

Code:

#   i was restoring a large kvm to freenas nfs pve storage

INFO: task rs:main Q:Reg:161330 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rs:main Q:Reg D ffff8801a530e0c0     0 161330 160682  102 0x00000000
 ffff88011ed37978 0000000000000086 0000000000000000 0000000000000086
 ffff88011ed37948 ffffffff81059aca ffff88011ed379c8 ffffffff00000002
 0000000200000000 ffff8801a530e660 ffff88011ed37fd8 ffff88011ed37fd8
Call Trace:
 [<ffffffff81059aca>] ? try_to_wake_up+0xaa/0x480
 [<ffffffffa04974d0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81513e33>] io_schedule+0x73/0xc0
 [<ffffffffa04974de>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff815147ff>] __wait_on_bit+0x5f/0x90
 [<ffffffffa04974d0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff815148a8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff81094650>] ? wake_bit_function+0x0/0x40
 [<ffffffffa04974bf>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa049dde7>] nfs_updatepage+0x2c7/0x5b0 [nfs]
 [<ffffffffa048b77a>] nfs_write_end+0x5a/0x290 [nfs]
 [<ffffffff8114ca04>] ? ii_iovec_copy_from_user_atomic+0x84/0x110
 [<ffffffff811216a1>] generic_file_buffered_write_iter+0x161/0x270
 [<ffffffff811238c7>] ? mempool_free_slab+0x17/0x20
 [<ffffffff811233bd>] __generic_file_write_iter+0x1fd/0x400
 [<ffffffff811f0725>] ? inode_incr_space+0x25/0x30
 [<ffffffffa04eee56>] ? __vzquota_alloc_space+0x1b6/0x330 [vzdquota]
 [<ffffffff81123645>] __generic_file_aio_write+0x85/0xa0
 [<ffffffff811236cf>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa048b2bc>] nfs_file_write+0x10c/0x210 [nfs]
 [<ffffffff8118f4ea>] do_sync_write+0xfa/0x140
 [<ffffffffa048e936>] ? nfs_revalidate_inode+0x26/0x60 [nfs]
 [<ffffffff81094610>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81094610>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81194fa4>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8118f7c8>] vfs_write+0xb8/0x1a0
 [<ffffffff811901d1>] sys_write+0x51/0x90
 [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
NFS: directory defer/C contains a readdir loop.  Please contact your server vendor.  Offending cookie: 464910718

note that last line:
NFS: directory defer/C contains a readdir loop. Please contact your server vendor. Offending cookie: 464910718

looks like maybe related to this:
http://wiki.linux-nfs.org/wiki/inde...seems_to_be_triggered_by_well-behaving_server

rengiared · Jun 25, 2012

@dietmar: when restoring i have super fast speeds with about 80mb/s and more

@RobFantini: thanks for the info with "number of servers". to be sure i raised this from 4 to 16 but there was/is no positive change
and as i said, in the log are no errors regarding this issue (only ARP changes of two completely other servers cause of teaming)

in the meantime i set up 2 other NAS, a Buffalo Terastation III with 4x 1TB and an older Synology DS110j with atm 1x250GB
with the Buffalo i have the same errors after short time, but with the synolgy it runs through without problems (only "slow" speed of 20mb/s)

every nas is on the latest firmware release

snowman66 · Jun 25, 2012

Have you read this: http://ubuntuforums.org/archive/index.php/t-1478413.html ?

bread-baker · Jun 25, 2012

one last suggestion, is freenas server set to Asynchronous ?

Super slow Backup with pve 2.x

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Attachments

Active Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Famous Member

Renowned Member

Active Member

Member

We value your privacy