Backup issues with proxmox 2.3?

The error points to a timeout, could it be that now that I have disks running on the NFS it is just slow enough to respond that it triggers a timeout? Perhaps there is a timeout value that could be adjusted?

I also saw such timeout several times - but so far I was not able to reproduce the bug.
 
I also saw such timeout several times - but so far I was not able to reproduce the bug.

i've also ran into this problem. i've got 4 vm's and backing up 2 vm's per day.
sometimes one of the vm's are not backuped with error message 'got timeout'.

Backupspace is mounted via nfs over 1GBit Link.
NFS mount options:
rw,nosuid,noexec,relatime,vers=3,rsize=8192,wsize=8192,namlen=255,soft,noacl,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=<ip>,mountvers=3,mountport=36442,mountproto=tcp,local_lock=none,addr=<ip>

pveversion:
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2


a more detailed error message about where the timeout occours (on writing to nfs or on locking and snapshoting the vm) would be fine.
the vm's are all kvm and backup type is snapshot with lzo compression.

is there any way to find out, where the timeout comes from?

greets,
argonius
 
try using proto=udp instead tcp and hard,intr instead soft and see if that changes the behavior
 
thanks, i made the changes and will monitor it the next few days. then i will get with new information ;)

hi,

this did not really help. now there is no error in the tasks, but backup is just not created. i switched back to soft and tcp.

i do not think that there is any network problem, cause the connection is a direct 1gbit connection. it also works on other proxmox nodes without problems.

is there any way to get a more detailed message than "got timeout"???

thanks,
patrick
 
hi,

this did not really help. now there is no error in the tasks, but backup is just not created. i switched back to soft and tcp.

i do not think that there is any network problem, cause the connection is a direct 1gbit connection. it also works on other proxmox nodes without problems.

is there any way to get a more detailed message than "got timeout"???

thanks,
patrick


problem still exists... also if i start the backup "manually" via webui it works... i really need to know, how to go deeper into
unsaying error message "got timeout" ...


thanks,
patrick
 
problem still exists... also if i start the backup "manually" via webui it works... i really need to know, how to go deeper into
unsaying error message "got timeout" ...


thanks,
patrick

i still running into this problem every night..

i also found an entry in the logfile saying:
"WARNING: unable to connect to VM 100 socket - timeout after 31 retries."

so it seems, the nfs isn't the problem.

1 second later i retried to start the job and it works...

how can i debug more detailed the "unable to connect to vm" message?

here is the complete message:
Oct 25 10:01:03 prox1 pvedaemon[534013]: <root@pam> starting task UPID:prox1:000BE902:0FBB5A86:526A253F:vzdump::root@pam:
Oct 25 10:01:03 prox1 pvedaemon[780546]: INFO: starting new backup job: vzdump 100 --remove 0 --mode snapshot --compress lzo --storage BACKUPS --node prox1
Oct 25 10:01:03 prox1 pvedaemon[780546]: INFO: Starting Backup of VM 100 (qemu)
Oct 25 10:01:04 prox1 qm[780551]: <root@pam> update VM 100: -lock backup
Oct 25 10:01:13 prox1 pvedaemon[534013]: WARNING: unable to connect to VM 100 socket - timeout after 31 retries
Oct 25 10:01:14 prox1 pvedaemon[780546]: ERROR: Backup of VM 100 failed - got timeout
Oct 25 10:01:14 prox1 pvedaemon[780546]: INFO: Backup job finished with errors
Oct 25 10:01:14 prox1 pvedaemon[780546]: job errors
Oct 25 10:01:14 prox1 pvedaemon[534013]: <root@pam> end task UPID:prox1:000BE902:0FBB5A86:526A253F:vzdump::root@pam: job errors
Oct 25 10:01:43 prox1 pvedaemon[534013]: <root@pam> starting task UPID:prox1:000BE968:0FBB6A38:526A2567:vzdump::root@pam:
Oct 25 10:01:43 prox1 pvedaemon[780648]: INFO: starting new backup job: vzdump 100 --remove 0 --mode snapshot --compress lzo --storage BACKUPS --node prox1
Oct 25 10:01:43 prox1 pvedaemon[780648]: INFO: Starting Backup of VM 100 (qemu)
Oct 25 10:01:43 prox1 qm[780653]: <root@pam> update VM 100: -lock backup

pveversion:
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2


thanks for any help.

greetz,
patrick
 
wireshark/tcpdump on the pve host and/or nfs target should give you more details whats going and where to look deeper
i had similar issues on two pve hosts and found that nic and switch had a problem sometimes with autonegotiation on higher throughput and after fixing both the issue didn't re-occur
 
Last edited:
thanks, i will go into this deeper, but i do not think that this is the problem, 'cause timeout comes when pve is trying to lock the vm and not when he writes backupdata (it does not come to this process)...

greetz,
patrick
 
Hi,
I have the same problem every day.

31.12.2013
VMIDNAMESTATUSTIMESIZEFILENAME
100OK0:03:4010.11GB/pool1/backup/dump/vzdump-qemu-100-2013_12_31-00_55_02.vma.lzo
102FAILED0:00:18got timeout
105FAILED0:00:08got timeout
111FAILED0:00:05got timeout
112OK0:38:43141.77GB/pool1/backup/dump/vzdump-qemu-112-2013_12_31-00_59_13.vma.lzo
115FAILED0:00:07got timeout
116FAILED0:00:34got timeout
117FAILED0:00:07got timeout
119FAILED0:00:27got timeout
124FAILED0:00:34got timeout
125FAILED0:00:23got timeout
129FAILED0:00:05got timeout
131FAILED0:00:53got timeout
145OK0:08:498.51GB/pool1/backup/dump/vzdump-qemu-145-2013_12_31-01_41_07.vma.lzo
153OK0:48:3541.40GB/pool1/backup/dump/vzdump-qemu-153-2013_12_31-01_49_56.vma.lzo
154FAILED0:00:08got timeout
TOTAL1:43:37201.79GB
02.01.2014
VMIDNAMESTATUSTIMESIZEFILENAME
100OK0:03:3310.11GB/pool1/backup/dump/vzdump-qemu-100-2014_01_02-00_55_03.vma.lzo
102FAILED0:00:10got timeout
105OK0:09:1324.51GB/pool1/backup/dump/vzdump-qemu-105-2014_01_02-00_58_47.vma.lzo
111OK0:06:3012.74GB/pool1/backup/dump/vzdump-qemu-111-2014_01_02-01_08_00.vma.lzo
112FAILED0:00:06got timeout
115OK0:05:5624.66GB/pool1/backup/dump/vzdump-qemu-115-2014_01_02-01_14_37.vma.lzo
116FAILED0:00:15got timeout
117FAILED0:00:18got timeout
119FAILED0:00:12got timeout
124FAILED0:00:16got timeout
125OK0:01:264.36GB/pool1/backup/dump/vzdump-qemu-125-2014_01_02-01_21_34.vma.lzo
129OK1:20:5590.96GB/pool1/backup/dump/vzdump-qemu-129-2014_01_02-01_23_00.vma.lzo
131FAILED0:00:23got timeout
145OK0:12:548.51GB/pool1/backup/dump/vzdump-qemu-145-2014_01_02-02_44_19.vma.lzo
153OK0:26:2841.38GB/pool1/backup/dump/vzdump-qemu-153-2014_01_02-02_57_13.vma.lzo
154OK0:18:4127.75GB/pool1/backup/dump/vzdump-qemu-154-2014_01_02-03_23_41.vma.lzo
TOTAL2:47:19244.99GB

102: Jan 02 00:58:37 INFO: Starting Backup of VM 102 (qemu)
102: Jan 02 00:58:37 INFO: status = running
102: Jan 02 00:58:37 INFO: update VM 102: -lock backup
102: Jan 02 00:58:38 INFO: backup mode: snapshot
102: Jan 02 00:58:38 INFO: ionice priority: 7
102: Jan 02 00:58:38 INFO: skip unused drive 'VMs2zfs:102/vm-102-disk-4.qcow2' (not included into backup)
102: Jan 02 00:58:38 INFO: creating archive '/pool1/backup/dump/vzdump-qemu-102-2014_01_02-00_58_37.vma.lzo'
102: Jan 02 00:58:41 ERROR: got timeout
102: Jan 02 00:58:41 INFO: aborting backup job
102: Jan 02 00:58:47 ERROR: Backup of VM 102 failed - got timeout

# pveversion -v
proxmox-ve-2.6.32: not correctly installed (running kernel: 3.2.0-0.bpo.4-amd64)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

 
I had that error often in 2.3 and occasionally in 3.0. Since upgrading to 3.1, this error has gone away.

Twice I got this error and found that my backup drive was full. I never did find a cause for all of the other failures.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!