[SOLVED] Now backup stuck...seems 4.1 doesn't like me at all...

fips

Renowned Member
May 5, 2014
175
7
83
So far I had 2 nodes (running 3.2) and a nfs storage, everything was more or less fine.
Backup took some not that fast, but it was good.

Now I wanted to switch to 4.1, install a third node and switch to a new Open-E Storage (Raid6 with 6 disks) with 4Gb FC.

I installed on the new server 4.1 (which made me some problems because of booting from zpool), and connected the storage.
The same happened with one of the old nodes, which I connected to the new 4.1 cluster.

My idea was to create from each VM (OpenVZ and KVM) a backup, copy it via scp to the new host and restore it.
The VM hard disks should lay on the FC storage.

Than my KVM VM's started to crash with following entry in dmesg:

[71648.166402] end_request: I/O error, dev vda, sector 0
[71806.591613] end_request: I/O error, dev vda, sector 30350400
[71806.592230] Aborting journal on device dm-0-8.
[71806.595123] journal commit I/O error
[71806.598837] EXT4-fs error (device dm-0): ext4_journal_start_sb:327: Detected aborted journal
[71806.599346] EXT4-fs (dm-0): Remounting filesystem read-only

I switched all KVM VM's write-through and so far I don't get those errors.

Afterwards I created on the new Open-E storage an NFS share for my backups.
Nodes and NFS share are connected to a seperate lan.

Last night I created a backup task, but it couldn't finish that backups....
Here is the out put from first node:

INFO: starting new backup job: vzdump 101 107 202 203 205 210 212 214 301 300 114 100 102 103 104 105 106 115 117 --mode snapshot --quiet 1 --storage SAS_NFS --compress lzo
INFO: skip external VMs: 210, 114, 102, 103, 106
INFO: Starting Backup of VM 100 (lxc)
INFO: status = running
INFO: mode failure - some volumes does not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf
INFO: starting first sync /proc/2785/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tmp
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/3a/971628634928ededbcf61ff4195ba3a0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/5b/6566f71fbefae69b857c5ca62b43f5b0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/87/29595cebce57fdc06178bc8939861870"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/fd/100d76b23440af7798f4b02d1118afd0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/1/ec/ad1844d760c5dd3d26a58032bd381ec1"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/2/3c/81f0a3de0031e49da228fdb4b2ec03c2"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/5/3c/af7bb1d59bee2a5a68060362d1d733c5"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/b/c0/4e127f7afc4ecb76d033466cc0e94c0b"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/d/89/09f583b4fb3cf92024209bde4a15b89d"
INFO: Number of files: 37,862 (reg: 25,786, dir: 7,177, link: 4,867, dev: 2, special: 30)
INFO: Number of created files: 37,861 (reg: 25,786, dir: 7,176, link: 4,867, dev: 2, special: 30)
INFO: Number of deleted files: 0
INFO: Number of regular files transferred: 25,771
INFO: Total file size: 769,120,252 bytes
INFO: Total transferred file size: 768,465,221 bytes
INFO: Literal data: 763,698,693 bytes
INFO: Matched data: 0 bytes
INFO: File list size: 786,338
INFO: File list generation time: 0.038 seconds
INFO: File list transfer time: 0.000 seconds
INFO: Total bytes sent: 765,868,778
INFO: Total bytes received: 537,370
INFO: sent 765,868,778 bytes received 537,370 bytes 35,512.18 bytes/sec
INFO: total size is 769,120,252 speedup is 1.00
INFO: rsync warning: some files vanished before they could be transferred (code 24) at main.c(1183) [sender=3.1.1]
INFO: first sync finished (21581 seconds)
INFO: suspend vm
INFO: starting final sync /proc/2785/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tmp
INFO: Number of files: 37,861 (reg: 25,785, dir: 7,177, link: 4,867, dev: 2, special: 30)
INFO: Number of created files: 13 (reg: 13)
INFO: Number of deleted files: 5 (reg: 5)
INFO: Number of regular files transferred: 21
INFO: Total file size: 759,803,217 bytes
INFO: Total transferred file size: 2,921,288 bytes
INFO: Literal data: 1,995,316 bytes
INFO: Matched data: 925,972 bytes
INFO: File list size: 65,526
INFO: File list generation time: 0.001 seconds
INFO: File list transfer time: 0.000 seconds
INFO: Total bytes sent: 2,870,021
INFO: Total bytes received: 16,268
INFO: sent 2,870,021 bytes received 16,268 bytes 617.19 bytes/sec
INFO: total size is 759,803,217 speedup is 263.25
INFO: final sync finished (4676 seconds)
INFO: resume vm
INFO: vm is online again after 4676 seconds
INFO: creating archive '/mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tar.lzo'

Here is the Output from second node:

INFO: starting new backup job: vzdump 102 --remove 0 --compress lzo --node vmbase3 --mode snapshot --storage SAS_NFS
INFO: Starting Backup of VM 102 (lxc)
INFO: status = running
INFO: mode failure - some volumes does not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf
INFO: starting first sync /proc/6323/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-102-2016_01_09-11_32_21.tmp



So far I never had really problem with my proxmox infrastructure, but this time somehow everything stuck, make problems, is crap...

I am so frustrated and disappointed...

best wishes

Stefan
 
Just looking at your backup logs, I think that it is functioning, but perhaps slower than you thought they would?

Snapshots are not yet supported for the LXC vms, unless you put them on ZFS, but rsync backup should also work. It would be better to have a local tmpdir as it says, to conserve xattr and acl settings, even though in many cases you don't actually need them.
 
Do you get any info on the commandline if you do:
Code:
/usr/bin/rpcinfo -p ipofnfsserver

Also Proxmox tries to a df on the mountpoint of the NFS system. Does that work?
 
output of that rpcinfo:
program vers proto port service

100000 2 tcp 111 portmapper

100000 2 udp 111 portmapper

100003 2 tcp 2049 nfs

100003 3 tcp 2049 nfs

100227 2 tcp 2049

100227 3 tcp 2049

100003 2 udp 2049 nfs

100003 3 udp 2049 nfs

100227 2 udp 2049

100227 3 udp 2049

100021 1 udp 39001 nlockmgr

100021 3 udp 39001 nlockmgr

100021 4 udp 39001 nlockmgr

100021 1 tcp 55369 nlockmgr

100021 3 tcp 55369 nlockmgr

100021 4 tcp 55369 nlockmgr

100005 1 udp 38146 mountd

100005 1 tcp 35122 mountd

100005 2 udp 38146 mountd

100005 2 tcp 35122 mountd

100005 3 udp 38146 mountd

100005 3 tcp 35122 mountd

100024 1 udp 39262 status

100024 1 tcp 44433 status



df gives me:
10.10.10.20:/SAS_NFS 1000G 6.0M 1000G 1% /mnt/pve/SAS_NFS


EDIT:
ahhh
NFS server offers UDP...
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!