Servers not starting after backup

AllanJ

New Member
Jan 2, 2009
4
0
1
Hi

We are using VMware at work, but i have been trying to get proxmox accepted for a few years, but lately i am having problems because servers dont run stable.

I am still using the free version, and right now it is ver 3.4-9, the servers are backed up every night at 1 oclock, and when i get to work 1 or 2 of them have stopped. It is not a big problem, just go and press start on the server, and they run again, but this is only accepted because it is development servers, it would be really bad if it was production servers.

I am having 6 servers in a cluster, and they are running 20-30 servers.
Storage is 4 servers with nfs, 2 are used for images, and the other 2 for backup files.

Does anyone have an idea to solve this problem ?

Best regards
Allan
 
Use you one backup job for every machine? Can you show it?

Something in the logs from the backup?

btw. using the term server/node for physical machine and VM/virtual machine for the virtual ones helps to distinct them, it's less confusing, at least for me :D
 
I have just defined one backup job to start at 1 at night, and added all Virtual machines to it.

I just looked at the log from this night, and the problem seems to be here:


INFO: starting new backup job: vzdump 100 102 108 109 110 111 121 167 168 170 135 147 172 173 174 175 105 106 107 169 103 101 104 177 --quiet 1 --mode snapshot --compress lzo --storage nas1disk
INFO: skip external VMs: 108, 109, 110, 121, 167, 168, 170, 135, 147, 172, 173, 174, 175, 105, 107, 169, 103, 101, 104
INFO: Starting Backup of VM 100 (qemu)
INFO: status = running
INFO: update VM 100: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating archive '/mnt/pve/nas1disk/dump/vzdump-qemu-100-2015_08_20-01_00_02.vma.lzo'
INFO: started backup task '9bf5b399-611f-48ee-9288-b4c9b8775521'
INFO: status: 0% (107020288/10737418240), sparse 0% (79785984), duration 3, 35/9 MB/s
----- many lines deleted ------
INFO: transferred 10737 MB in 701 seconds (15 MB/s)
INFO: archive file size: 1000MB
INFO: delete old backup '/mnt/pve/nas1disk/dump/vzdump-qemu-100-2015_08_19-01_00_02.vma.lzo'
INFO: Finished Backup of VM 100 (00:12:31)
INFO: Starting Backup of VM 102 (qemu)
INFO: status = running
INFO: update VM 102: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/nas1disk/dump/vzdump-qemu-102-2015_08_20-01_12_33.vma.lzo'
INFO: started backup task 'ff3e1c89-827d-42b1-b560-af8ba4462e4a'
INFO: status: 1% (160825344/12884901888), sparse 1% (140681216), duration 3, 53/6 MB/s
INFO: status: 2% (327286784/12884901888), sparse 2% (306892800), duration 6, 55/0 MB/s
INFO: status: 3% (479920128/12884901888), sparse 3% (459526144), duration 9, 50/0 MB/s
INFO: status: 4% (623837184/12884901888), sparse 4% (603443200), duration 12, 47/0 MB/s
----- many lines deleted -----
INFO: status: 45% (5803212800/12884901888), sparse 35% (4530487296), duration 144, 49/47 MB/s
INFO: status: 46% (5943132160/12884901888), sparse 35% (4534693888), duration 147, 46/45 MB/s
INFO: status: 47% (6064439296/12884901888), sparse 35% (4589502464), duration 150, 40/22 MB/s
INFO: status: 48% (6192562176/12884901888), sparse 36% (4717613056), duration 153, 42/0 MB/s
INFO: status: 49% (6320750592/12884901888), sparse 37% (4845793280), duration 156, 42/0 MB/s
INFO: status: 50% (6447824896/12884901888), sparse 38% (4972859392), duration 159, 42/0 MB/s
ERROR: VM 102 not running
INFO: aborting backup job
ERROR: VM 102 not running
ERROR: Backup of VM 102 failed - VM 102 not running


And then it goes on to backup the other VM's

Best regards
Allan
 
INFO: status: 49% (6320750592/12884901888), sparse 37% (4845793280), duration 156, 42/0 MB/s
INFO: status: 50% (6447824896/12884901888), sparse 38% (4972859392), duration 159, 42/0 MB/s
ERROR: VM 102 not running
INFO: aborting backup job
ERROR: VM 102 not running
ERROR: Backup of VM 102 failed - VM 102 not running

If you using snapshot backup and the VM shutsdown/reboots during the backup process, there is this error.

Whats on the VM's log druing this time or did some shutdown the VM on purpose during the backup?
 
I have now been testing this for the last 1 1/2 month, and the problem is still there. I have changed the backup to only run on sunday, and all virtual machines run fine during the week, but on monday morning one or more virtual machines have stopped, this morning it was 3.

I have looked at the logs to see if something special was running during the backup, but without results.
 
Just a note if someone sees something like this at a later date, the problem is gone after updating to version 4.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!