Help! I'm in deep trouble, can't start containers after reboot

jarcher

Member
Mar 15, 2009
53
1
6
Hi All...

I'm in deep trouble here :-(

I would greatly appreciate any assistance!

I was having some trouble managing containers, I would get an error about a worker thread when I tried to use the web based management interface. I was able to stop containers using vzctl at the command line. The containers were all operating okay, so I though there was an issue limited to the user interface. Follishly, I rebooted the machine. It came back up and now I can't start any of my containers. These are critical!

If I try to use vzctl to start the containers, I get:

root@pmmaster:/# vzctl start 101
Container config file does not exist
root@pmmaster:/#

If I try to list the containers:

root@pmmaster:/# vzlist -a
Container(s) not found
root@pmmaster:/#


Thanks very much!

Here is my version information:


root@pmmaster:/# pveversion -v
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-95
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2
 
Last edited:
Digging further, I see that there are my containers in /var/lib/vz/private

for example, container 101 has its data there.

But, if I look in /var/lib/vz/root/101

There are no files at all in there. I think this is normal when containers are not running. But since my data is there I guess there is hope... Now I'm trying to track down there the config files are supposed to be.
 
I was able to use the web interface to create a new container, number 3000. There is a config file for it:

root@pmmaster:/etc/pve/nodes/pmmaster/openvz# ls -l
total 1
-rw-r----- 1 root www-data 923 Feb 27 17:07 3000.conf
root@pmmaster:/etc/pve/nodes/pmmaster/openvz#


But that is the only config file in that directory. So apparently all my container config files have vanished. If this is correct, is there a way to recreate them? I have been backing up containers so can I maybe extract just the config files from the backups?
 
As a quick fix, since I absolutely had to have some containers running, I copied the config files from the other (now inactive node) to /etc/pve/nodes/pmmaster/openvv with different file names. This allowed me to start the containers, even though the name of the config file didn't match the name of the container. It seems to be running, but this can't stay this way.

I did some reading about fuse, and as near as I can tell, the contents of /etc/pve is actually a SQLite database. So perhaps there is some corruption in the SQLite storage?
 
As a quick fix, since I absolutely had to have some containers running, I copied the config files from the other (now inactive node) to /etc/pve/nodes/pmmaster/openvv with different file names. This allowed me to start the containers, even though the name of the config file didn't match the name of the container. It seems to be running, but this can't stay this way.

I did some reading about fuse, and as near as I can tell, the contents of /etc/pve is actually a SQLite database. So perhaps there is some corruption in the SQLite storage?
Hi,
is the rebooted node part of an cluster?
Do you have the confings then still on the other node? /etc/pve/nodes/NODENAME/openvz/
What's about backups?

Udo
 
Hi Udo, thank you for the reply.

Well this node was part of a cluster at one time a while ago. I upgraded this one but not the other node, since I didn't need the cluster. I now see the other node in the "datacenters" list and it has the containers listed and grayed out. And yes, the config files are on the other node, that's how I was able to recover them. I tried to copy them from the old node to the current one but it would not let me, saying those names are already in use. This is why I renamed them (for example, I renamed 101.conf to 3010.conf). In most cases this worked, although not all.

Each container is backed up using the Proxmox backup capability. I have not attempted to restore them because I was, at the time, under the impression that there was a file system issue. However, after reading about the cluster file system under fuse, I am thinking its normal that I can not write the files since they are already in the other node.

I have not wanted to risk restoring from backup since I really do not understand what is going on.
 
Hi,
this sounds, that you have an broken cluster - normaly both (or how many nodes are in your cluster?) nodes should mount /etc/pve read-only, because they don't have quorum...

You know best, what you have done to get quorum...

what is the output of following commands (on all nodes)
Code:
pvecm nodes
pvecm status
mount | grep pve
find /etc/pve

Perhaps the cluster-filesystem in only mounted over your old /etc/pve?
Try to stop pve-cluster, control the mounts and take a look on /etc/pve

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!