Strange behavior when backup to Win 2012 NFS Service

ewuewu

Renowned Member
Sep 14, 2010
58
0
71
Hamburg
Hello Guys,
I think my question is more Win releated but I couldn’t find any hint that leads me on the right direction. I hope that one of you have an idea that brings me on track.

I’ve setup a Proxmox Cluster( 3 nodes) and try to backup my VMs to a Win 2012 NFS service. The cluster has three bondend interfaces.

Code:
pveversion --verboseproxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.3-1 (running version: 3.3-1/a06c9f73)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-23
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-9
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Bond0 has vmbr0 assigned to it and vmbr0 has the the IP 172.18.0.3/22. It is for cluster communication and VMs traffic

Bond1 has vmbr1 assigned to it and is for backup traffic. The ip of vmbr1 is 192.168.151.3/24

The same setup is configured on the other nodes (Off course with different ip addresses)

The bonding is configured on the switches also.

On the other side there is a windows 2012 server with two nics and nfs service installed. The first nic is dedicated for normal network traffic and has the ip 172.18.0.51/22.

The second nic is for the nfs cluster backup and has the ip 192.168.151.1/24

The win server exports the directory ‘backp-vms’

On the cluster there is defined an shared nfs storage named ‘backup-vms’ connected explicitly to the IP (192.168.151.1) of the windows server.
I’ve double checked the successful mounts on the cluster nodes
Whenever I start a backup the backup is running very slow. Sometimes backup is finished after a long time, sometimes it stops with an error message like ‘lzop - broken pipes’
During the backup I see a lot messages like the following ones in syslog:
Code:
[LIST]
[*]zdump[6952]: ERROR: Backup of VM 100 failed - vma_queue_write: write error - Broken pipe
[*]task lzop:7665 blocked for more than 120 seconds.
[*]kernel: ct0 nfs: server 192.168.151.1 not responding, timed out
[*]WARNING: command 'df -P -B 1 /mnt/pve/backup-vms' failed: got timeout
[*]WARNING: command 'df -P -B 1 /mnt/pve/backup-vms' failed: got timeout
[/LIST]
The most of the messages are disappearing after a while, the backup continues, sometimes it finishes sometimes it stops with ‘Broke pipe’.

I’ve checked the mounts on the windows server with ’showmounts –a’. But I am not really happy with its output.

Code:
showmounts -a

Alle Bereitstellungspunkte auf DMC-BACKUP1-NI:
172.18.0.2                        : /backup-vms
172.18.0.3                        : /backup-vms
172.18.0.4                        : /backup-vms
192.168.151.2                      : /backup-vms
192.168.151.3                      : /backup-vms
192.168.151.4                      : /backup-vms

It seems to me, that the backup dir is mounted from every IP of the proxmox clusters. How can this be?

I assume that this could be the root of the problem. Maybe I misunderstand the result but I think the IPs ‘172.18.0.x’ should not be mounted from the cluster nodes. I was awaiting that only the IPs 192.168.151.x are connected.

I’ve no clue what’s going wrong.
 
Hello ewuewu

can you post

Code:
cat /etc/pve/storage.cfg

and
Code:
pvesm status

and
Code:
df

from the host(s) ?

How about accessing NFS backup storage directly from Proxmox command line, e.g. copying a file?

Is it slow too?

And the opposite: when you backup to a local storage - works fine?

These small experiments should show if it´s a backup problem or a general NFS server one.


Kind regards

Mr.Holmes

P.S.: I think

backp-vms

in your post was just a typo ....
 
Last edited:
Hello laowolf,

I didn't setup any quotas cause the disks on the nfs-server are big enough to keep all of our backups. We are using max. 75 percent of the capacity of the backup storage. No other processes on the backup server are using the dedicated backup disks.

Are there any other reasons to setup a quota?
 
Recently, I have tested a dell nx3200 nas device which has a default windows storage 2012 os. If you have setup the nfs share with the windows 2012 server nfs service, you have to configure the quota of the nfs share with soft limit. because the default quota has a hard limit, so you may encounter something strange when using the nfs share....


Hello laowolf,

I didn't setup any quotas cause the disks on the nfs-server are big enough to keep all of our backups. We are using max. 75 percent of the capacity of the backup storage. No other processes on the backup server are using the dedicated backup disks.

Are there any other reasons to setup a quota?
 
Last edited:
Hello Mr.Holmes,

thanks for your reply.

What I did meanwhile was borrow a hardware box from the dev department, installed ubuntu on it and setup a nfs-kernel-server.

Unfortunately the backup job with the Ubuntu server behaves similar to the backup-job with the MS nfs-server. (Slightly different timings – but in the result the same problems) Everything I wrote above was happening on the ubuntu box also.

So I think it wouldn’t be fair to blame Windows for that. Sadly I had to give back the linux box after the test.
Here is my storage.cfg

Code:
cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0
 
lvm: localvms
        vgname localvmdata
        content images
        nodes lx-vmhost-ni1
 
nfs: backup-vms
        path /mnt/pve/backup-vms
        server 192.168.151.1
        export /backup-vms
        options vers=3
        content backup
        maxfiles 14

Code:
pvesm status
backup-vms        nfs 1     14837216256     10555430432      4281785824 71.64%
local                       dir 1        30963708          176236        30787472 1.07%
localvms                    lvm 1      3905929216               0      3271528448 0.50%

Code:
df –h
df -h
Filesystem                                      Size  Used Avail Use% Mounted on
udev                                             10M     0   10M   0% /dev
tmpfs                                           6.3G  508K  6.3G   1% /run
/dev/mapper/pve-root                             28G  3.4G   23G  13% /
tmpfs                                           5.0M  4.0K  5.0M   1% /run/lock
tmpfs                                            13G   56M   13G   1% /run/shm
/dev/mapper/pve-data                             30G  173M   30G   1% /var/lib/vz
/dev/fuse                                        30M   28K   30M   1% /etc/pve
192.168.151.1:/backup-vms-NI   14T  9.9T  4.0T  72% /mnt/pve/backup-vms

Code:
cat /etc/vzdump.conf
# vzdump default settings
 
#tmpdir: DIR
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
bwlimit: 40000
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#size: MB
size: 4096
#maxfiles: N
#script: FILENAME
#exclude-path: PATHLIST

I also made a test with reading and writing a 2 GB Files local disk directly to the backup storage and vice versa.
Readings works with reasonable performance, writing seems incredible slow to me.
I assume my bandwidth throttling from vzdump.conf does not affect direct reads and writes.

Code:
Reading:
time cp /mnt/pve/transfer-dir/test.txt .
 
real    0m19.069s
user    0m0.037s
sys     0m4.618s
 
Writing:
time cp test.txt /mnt/pve/transfer-dir
 
real    3m46.271s
user    0m0.013s
sys     0m2.789s

I also tested to change my network settings. I bound the IP address for the backup connection directly to bond1 instead to vmbr1 and removed vmbr1 (which was bound to bond 1 before).

Unfortunately this had no effect.
 
Thank you all guys for your help.

We solved this issue.

We had a mad switch in between our infrastructure. After chnaging it everythings works as expected.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!