Unable to start container with new release 3.4.8

tilao · Jul 29, 2015

Hi,

I've got a new physical server from my provider and installed Proxmox. They have scripted the install and I then got the latest version of Proxmox VE (i.e. 3.4.8). I tired to create a container (through vzctl create and a homebrew template) and couldn't get it start at all. I then went to the web console, downloaded a standard package (i.e. debian-7.0-standard_7.0-2_i386.tar.gz), created a container with the web interface, switch to ssh, entered vzctl start 301 and got :

Code:

root@host:~# vzctl start 301
Starting container ... 
Container is mounted
Adding IP address(es): 10.10.30.251
Unable to add IP 10.10.30.251: Inappropriate ioctl for device
Unable to del IP 10.10.30.251: Inappropriate ioctl for device
Container start failed (try to check kernel messages, e.g. "dmesg | tail")
Killing container ...
Container was stopped
Container is unmounted

The dmseg gives me :

Code:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: failed command: FLUSH CACHE EXT
ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 6
        res 11/04:00:2f:57:0d/00:00:00:00:00/a6 Emask 0x3 (HSM violation)
ata1.00: status: { ERR }
ata1.00: error: { ABRT }
ata1.00: hard resetting linkata1.01: hard resetting linkata1.01: failed to resume link (SControl 0)
ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.01: SATA link down (SStatus 0 SControl 0)
ata1.00: configured for UDMA/100
ata1: EH complete
CT: 301: started
CT: 301: stopped

The technical support told me it maybe was my network configuration. I then tried to create the container in Bridged mode only to notice I couldn't start it either. This time it gave me :

Code:

root@host:~# vzctl start 302Starting container ... 
Container is mounted
Setting CPU units: 1000
Setting CPUs: 1
Configure veth devices: veth302.0 
Error: veth feature is not supported by kernel
Please check that vzethdev kernel module is loaded
Container start failed (try to check kernel messages, e.g. "dmesg | tail")
Killing container ...
Container was stopped
Container is unmounted

The dmseg gives me :

Code:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: BMDMA stat 0x5
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:00:80:e2:f6/00:01:66:00:00/e0 tag 27 dma 131072 in
         res 11/04:00:88:76:93/04:01:6d:00:00/ed Emask 0x3 (HSM violation)
ata1.00: status: { ERR }
ata1.00: error: { ABRT }
ata1.00: hard resetting link
ata1.01: hard resetting link
ata1.01: failed to resume link (SControl 0)
ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.01: SATA link down (SStatus 0 SControl 0)
ata1.00: configured for UDMA/100
ata1: EH complete
md: md1: resync done.
RAID1 conf printout:
 --- wd:2 rd:2
 disk 0, wo:0, o:1, dev:sda2
 disk 1, wo:0, o:1, dev:sdb2
CT: 302: started
CT: 302: stopped

And for instance, here is my network configuration:

Code:

root@host:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).


# The loopback network interface
auto lo
iface lo inet loopback


iface eth0 inet manual


iface eth1 inet manual


auto vmbr0
iface vmbr0 inet static
    address 62.210.X.X
    netmask 255.255.255.0
    gateway 62.210.X.X
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0


auto vmbr1
iface vmbr1 inet static
    address  10.10.30.254
    netmask  255.255.255.0
    bridge_ports none
    bridge_stp off
    bridge_fd 0
    post-up echo 1 > /proc/sys/net/ipv4/ip_forward
    post-up iptables -t nat -A POSTROUTING -s '10.10.30.0/24' -o vmbr0 -j MASQUERADE
    post-down iptables -t nat -D POSTROUTING -s '10.10.30.0/24' -o vmbr0 -j MASQUERADE

Except that, all is juste plain fresh Proxmox install. I have tried precisely the same process on my other machine (currently running Proxmox 3.4.6) and all went fine. I really not understand what's happening. Is there a bug in the new release or have I totally missed something ?

Further details on my machines :

The one causing problem :

Code:

root@newhost:~/create-template# pveversion
pve-manager/3.4-8/5f8f4e78 (running kernel: 2.6.32-40-pve)

The one working fine :

Code:

root@oldhost:~# pveversion
pve-manager/3.4-6/102d4547 (running kernel: 2.6.32-39-pve)

Regards,
Pierre.

tom · Jul 29, 2015

works here. post the output of:

> pveversion -v

tilao · Jul 29, 2015

Hi,

The not working one :

Code:

[FONT=Menlo]root@newhost:~# pveversion -v
[/FONT][FONT=Menlo]proxmox-ve-2.6.32: 3.4-159 (running kernel: 2.6.32-40-pve)[/FONT]
[FONT=Menlo]pve-manager: 3.4-8 (running version: 3.4-8/5f8f4e78)[/FONT]
[FONT=Menlo]pve-kernel-2.6.32-40-pve: 2.6.32-159[/FONT]
[FONT=Menlo]lvm2: 2.02.98-pve4[/FONT]
[FONT=Menlo]clvm: 2.02.98-pve4[/FONT]
[FONT=Menlo]corosync-pve: 1.4.7-1[/FONT]
[FONT=Menlo]openais-pve: 1.1.4-3[/FONT]
[FONT=Menlo]libqb0: 0.11.1-2[/FONT]
[FONT=Menlo]redhat-cluster-pve: 3.2.0-2[/FONT]
[FONT=Menlo]resource-agents-pve: 3.9.2-4[/FONT]
[FONT=Menlo]fence-agents-pve: 4.0.10-3[/FONT]
[FONT=Menlo]pve-cluster: 3.0-18[/FONT]
[FONT=Menlo]qemu-server: 3.4-6[/FONT]
[FONT=Menlo]pve-firmware: 1.1-4[/FONT]
[FONT=Menlo]libpve-common-perl: 3.0-24[/FONT]
[FONT=Menlo]libpve-access-control: 3.0-16[/FONT]
[FONT=Menlo]libpve-storage-perl: 3.0-33[/FONT]
[FONT=Menlo]pve-libspice-server1: 0.12.4-3[/FONT]
[FONT=Menlo]vncterm: 1.1-8[/FONT]
[FONT=Menlo]vzctl: 4.0-1pve6[/FONT]
[FONT=Menlo]vzprocps: 2.0.11-2[/FONT]
[FONT=Menlo]vzquota: 3.1-2[/FONT]
[FONT=Menlo]pve-qemu-kvm: 2.2-11[/FONT]
[FONT=Menlo]ksm-control-daemon: 1.1-1[/FONT]
[FONT=Menlo]glusterfs-client: 3.5.2-1[/FONT]

The working one :

Code:

[FONT=Menlo]root@oldhost:~# pveversion -v
[/FONT][FONT=Menlo]proxmox-ve-2.6.32: 3.4-157 (running kernel: 2.6.32-39-pve)[/FONT]
[FONT=Menlo]pve-manager: 3.4-6 (running version: 3.4-6/102d4547)[/FONT]
[FONT=Menlo]pve-kernel-2.6.32-39-pve: 2.6.32-157[/FONT]
[FONT=Menlo]lvm2: 2.02.98-pve4[/FONT]
[FONT=Menlo]clvm: 2.02.98-pve4[/FONT]
[FONT=Menlo]corosync-pve: 1.4.7-1[/FONT]
[FONT=Menlo]openais-pve: 1.1.4-3[/FONT]
[FONT=Menlo]libqb0: 0.11.1-2[/FONT]
[FONT=Menlo]redhat-cluster-pve: 3.2.0-2[/FONT]
[FONT=Menlo]resource-agents-pve: 3.9.2-4[/FONT]
[FONT=Menlo]fence-agents-pve: 4.0.10-2[/FONT]
[FONT=Menlo]pve-cluster: 3.0-18[/FONT]
[FONT=Menlo]qemu-server: 3.4-6[/FONT]
[FONT=Menlo]pve-firmware: 1.1-4[/FONT]
[FONT=Menlo]libpve-common-perl: 3.0-24[/FONT]
[FONT=Menlo]libpve-access-control: 3.0-16[/FONT]
[FONT=Menlo]libpve-storage-perl: 3.0-33[/FONT]
[FONT=Menlo]pve-libspice-server1: 0.12.4-3[/FONT]
[FONT=Menlo]vncterm: 1.1-8[/FONT]
[FONT=Menlo]vzctl: 4.0-1pve6[/FONT]
[FONT=Menlo]vzprocps: 2.0.11-2[/FONT]
[FONT=Menlo]vzquota: 3.1-2[/FONT]
[FONT=Menlo]pve-qemu-kvm: 2.2-10[/FONT]
[FONT=Menlo]ksm-control-daemon: 1.1-1[/FONT]
[FONT=Menlo]glusterfs-client: 3.5.2-1[/FONT]

Thanks for your time,
Pierre.

tom · Jul 29, 2015

the packages looks ok, I run several servers with these.

any other special settings? I see mdraid, and also resync messages. faulty disks?

so I assume you have issue on your mdraid or hardware, packages seems ok.

tilao · Jul 29, 2015

It was what I first thought (the faulty disks) but as you can see, mdstat seems ok.

Code:

root@newhost:~# cat /proc/mdstatPersonalities : [raid1] 
md1 : active raid1 sda2[0] sdb2[1]
      930585408 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1]
      291520 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

It's when I saw about the faulty kernel module that I thought it was maybe a software problem.

Code:

Error: veth feature is not supported by kernel
Please check that vzethdev kernel module is loaded

I will ask my provider a full hardware inspection to be sure the machine is ok.
Thanks for your investigation,
Pierre.

tom · Jul 29, 2015

tilao said:
..
It's when I saw about the faulty kernel module that I thought it was maybe a software problem.

Code:

Error: veth feature is not supported by kernel Please check that vzethdev kernel module is loaded

check the module with:

> modprobe vzethdev

> modinfo vzethdev

but as you use our kernel, this one should be in by default.

tilao · Jul 29, 2015

Hi,

Yeah, it's what I should have done in the first place !

The server is currently unavailable as they hardware check it.
I will try it out as soon as i get it back.

Thanks,
Pierre.

tilao · Jul 30, 2015

Hi,

It was one of the disk of the soft-raid that had bad sectors.
Their hardware supervisor couldn't saw it until a full offline check.
Bad luck for a just delivered server !

It guess this problem prevented the kernel to load its modules and the like.

The thing I'm surprised about is that mdadm reported problems in dmesg but mdstat showed that all was fine.
I thought mdstat would be more prone to report any kind of errors or problems like theses ones.

Do you guys have ways to scrutinize in an automated way errors or messages announcing hardware failures ?
I wouldn't like having a production server having an hard disk slowly-failing without noticing it except through software failures !

Thanks again,
Best Regards,
Pierre.

tom · Jul 30, 2015

mdraid is known for this. we never recommends mdraid for Proxmox VE, but people still like it and use it.

if you want more reliable software based raid, consider using ZFS, available since 3.4 in Proxmox VE.

Note: running OpenVZ on ZFS is not a supported config (OpenVZ limitation).

tilao · Jul 30, 2015

Yes I saw somewhere Proxmox didn’t support soft-raid based machines. I was wondering why. I now have the answer !

Hopefully I didn’t plan to go to production with this hardware. I thought when reading this warning that it certainly was for a good reason you didn’t commercially support soft-raid. This machine was only intended to prepare and mimic our production server to get the config clean and functional right away through shell-script only procedures.

I’m now definitively confident I did a good choice when I decided I was willing to pay some extra bucks for a hardware raid based production machine ! This sound now like a no brainer question.

In the end I got quite lucky to learn this the hard way (with a real hardware failure) but free of data corruption and service down time as it wasn’t a production server !

Best Regards,
Pierre.

Search

Search

Unable to start container with new release 3.4.8

tilao

New Member

tom

Proxmox Staff Member

tilao

New Member

tom

Proxmox Staff Member

tilao

New Member

tom

Proxmox Staff Member

tilao

New Member

tilao

New Member

tom

Proxmox Staff Member

tilao

New Member