[SOLVED] OpenVZ containers going offline every night

nbrogi

New Member
Nov 28, 2016
21
0
1
42
Hi, everyone!

I can't figure this out.

I'm having some containers go offline every night at approximately the same time, for about 5 minutes at a time, I'd say 2-3 times over a couple of hours.

For sure this is related to backing up, but the weird thing is that I disabled backups for these particular containers—which are supposed to be available at all times—and they still go down while backing up happens.

Any idea why this might be, and how I can solve the problem?

I'm using OpenVZ containers on Proxmox 3.4, pveversion "pve-manager/3.4-6/102d4547 (running kernel: 2.6.32-39-pve)"
 
Last edited:
Absolutely!

# cluster wide vzdump cron schedule
# Automatically generated file - do not edit

PATH="/usr/sbin:/usr/bin:/sbin:/bin"

0 0 * * * root vzdump --quiet 1 --mailnotification failure --mode snapshot --mailto email@example.com --all 1 --compress gzip --storage local --exclude 603,627​
 
Sorry, where would I find that?

I've never had to mess with Proxmox before, and things usually just work.
 
Got it. I appreciate your help.

I don't see anything out of the ordinary, it's just me stopping/starting the machine, and restoring it 10 days ago.
 
Maybe anything relevant in your syslog?
I see this which might be relevant, but I don't see the machine that goes down, since backups are disabled:
2016-11-28T03:10:43+0100 vzctl : CT 631 : Setting up checkpoint...
2016-11-28T03:10:43+0100 vzctl : CT 631 : suspend...
2016-11-28T03:10:43+0100 vzctl : CT 631 : get context...
2016-11-28T03:10:43+0100 vzctl : CT 631 : Checkpointing completed successfully
2016-11-28T03:10:43+0100 vzctl : CT 631 : Resuming...
Somehow, it's still taken down.
 
I see this which might be relevant, but I don't see the machine that goes down, since backups are disabled:
2016-11-28T03:10:43+0100 vzctl : CT 631 : Setting up checkpoint...
2016-11-28T03:10:43+0100 vzctl : CT 631 : suspend...
2016-11-28T03:10:43+0100 vzctl : CT 631 : get context...
2016-11-28T03:10:43+0100 vzctl : CT 631 : Checkpointing completed successfully
2016-11-28T03:10:43+0100 vzctl : CT 631 : Resuming...
Somehow, it's still taken down.
Could you tell us exactly when the machine goes down and when comes it up? (Since it's a CT clocks should be in sync.)
 
The problem seems to be here:

2016-11-28T01:20:31+0100 vzctl : CT 627 : Killing container ...
2016-11-28T01:20:31+0100 vzctl : CT 627 : Container was stopped
2016-11-28T01:20:32+0100 vzctl : CT 627 : Container is unmounted
2016-11-28T01:24:51+0100 vzctl : CT 627 : Starting container ...
2016-11-28T01:24:51+0100 vzctl : CT 627 : Container is mounted
2016-11-28T01:24:51+0100 vzctl : CT 627 : Adding IP address(es): xxx.xx.xx.xx
2016-11-28T01:24:52+0100 vzctl : CT 627 : Setting CPU units: 1000
2016-11-28T01:24:52+0100 vzctl : CT 627 : Setting CPUs: 8
2016-11-28T01:24:52+0100 vzctl : CT 627 : Setting devices
2016-11-28T01:24:52+0100 vzctl : CT 627 : Container start in progress...

I have no idea why it would stop it, though.
 
Look for the reasons before that. Who's initiating it, HA manager, or something else? Is there some related stuff in dmesg (or in the case of this bloody damned systemd, the journalctl)? Maybe OOM, or any lxc or kernel errors?
 
Anything in /var/log/pve/tasks/index around the same time?
I'm not sure, it might be this but I don't have timestamps:
UPID:FR-2:00010DBB:A9D2BD3D:583B784F:vzstop:627:root@pam: 583B7859 OK
UPID:FR-2:000118A4:A9D319D6:583B793C:vzstart:627:root@pam: 583B7957 OK
UPID:FR-2:000FFD96:A9CB5E9E:583B6571:vzdump::root@pam: 583B9256 OK​
 
Look for the reasons before that. Who's initiating it, HA manager, or something else? Is there some related stuff in dmesg (or in the case of this bloody damned systemd, the journalctl)? Maybe OOM, or any lxc or kernel errors?
I'm not sure, /var/log/dmesg doesn't have timestamps, but I also don't see the VMID.

I did "locate journalctl" and it didn't return anything, so it's possible that I'm not using it.

What component besides the backup daemon (or whatever is called) would be stopping containers? Maybe there's some setting that I can change...
 
I'm not sure, /var/log/dmesg doesn't have timestamps, but I also don't see the VMID.
I did "locate journalctl" and it didn't return anything, so it's possible that I'm not using it.

Try journalctl -xn 500 or similar, I don't know the optimal command line since it's not really my favourite tool, to put it nicely. Should show all kinds of logs with timestamps, unless it decides it's not in the mood.

What component besides the backup daemon (or whatever is called) would be stopping containers? Maybe there's some setting that I can change...

ha-manager comes to my mind first, but others may come up with better ideas based on the data you have presented.
 
Oh okay, you're using 3.4. My 4.xx is jessie. Sorry.
Then you should have the info in /var/log/kern.log with timestamps. But I believe without systemd even the syslog is able to contain all the relevant info, so if you didn't see anything relevant I'm out of [basic] ideas and leave others to chime in.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!