Emergency start of containers

SamTzu · Oct 17, 2017

I had to reboot one host node of Proxmox v4.x because one older lxc container stopped responding (maybe hacked.)
After reboot host failed to join quorum so no VM's started (this behavior should be optional setting somewhere. It only makes sense if you have HA configured and most don't.)

After fighting a few hours to get the quorum up I gave up and stopped the cluster service on that host and started it on a localhost mode.

Code:

/etc/init.d/pve-cluster stop
/usr/bin/pmxcfs -l

After that I could start the Qemu VM's manually with these commands:

Code:

ls -ahl /etc/pve/qemu-server/
1001.conf
qm start 1001

Repeat ad nauseum...

WIth LXC it was:

Code:

ls -ahl /etc/pve/lxc/
101.conf
lxc-start -n 101 -d

Repeat ad nauseum...

Now all VM's are up for the moment but for some reason I can't access the Qemu VM's over the network. LXC containers work fine now without quorum.
Any ideas how to access/enable the network for Qemu containers?

-Sam

Alwin · Oct 17, 2017

You can't establish quorum and your VMs/CTs have no network either, so I guess you get, where I am going. Check your network settings of this particular PVE host.

SamTzu · Oct 17, 2017

LXC containers work fine now without quorum. I will edit/add that data to the original message

SamTzu · Oct 17, 2017

ip a s

Code:

root@p1:/etc/pve/lxc# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether 70:4d:7b:2d:88:4d brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr1 state UP group default qlen 1000
    link/ether 00:1b:21:05:08:b4 brd ff:ff:ff:ff:ff:ff
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 70:4d:7b:2d:88:4d brd ff:ff:ff:ff:ff:ff
    inet 10.10.12.15/8 brd 10.255.255.255 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::724d:7bff:fe2d:884d/64 scope link
       valid_lft forever preferred_lft forever
5: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:1b:21:05:08:b4 brd ff:ff:ff:ff:ff:ff
    inet 172.22.12.15/16 brd 172.22.255.255 scope global vmbr1
       valid_lft forever preferred_lft forever
    inet6 fe80::21b:21ff:fe05:8b4/64 scope link
       valid_lft forever preferred_lft forever
6: tap107i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether 66:06:b7:91:dc:a8 brd ff:ff:ff:ff:ff:ff
8: veth101i0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:0d:08:53:d3:d4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
10: veth101i1@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:5e:74:b2:38:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0
11: tap1000i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr1000i0 state UNKNOWN group default qlen 1000
    link/ether 3e:a6:fe:0a:8e:81 brd ff:ff:ff:ff:ff:ff
12: fwbr1000i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:f9:87:aa:7b:05 brd ff:ff:ff:ff:ff:ff
13: fwpr1000p0@fwln1000i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 56:27:e7:49:69:b7 brd ff:ff:ff:ff:ff:ff
14: fwln1000i0@fwpr1000p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr1000i0 state UP group default qlen 1000
    link/ether 62:f9:87:aa:7b:05 brd ff:ff:ff:ff:ff:ff
15: tap1001i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr1001i0 state UNKNOWN group default qlen 1000
    link/ether 4e:54:2d:17:5c:41 brd ff:ff:ff:ff:ff:ff
16: fwbr1001i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2a:bd:e2:fb:fb:bc brd ff:ff:ff:ff:ff:ff
17: fwpr1001p0@fwln1001i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether b6:21:50:2e:e5:85 brd ff:ff:ff:ff:ff:ff
18: fwln1001i0@fwpr1001p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr1001i0 state UP group default qlen 1000
    link/ether 2a:bd:e2:fb:fb:bc brd ff:ff:ff:ff:ff:ff
20: veth115i0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:d4:84:86:7a:ff brd ff:ff:ff:ff:ff:ff link-netnsid 1
21: tap400i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether de:84:02:e1:8f:47 brd ff:ff:ff:ff:ff:ff
22: tap400i1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr1 state UNKNOWN group default qlen 1000
    link/ether 2e:56:5c:2b:bb:c7 brd ff:ff:ff:ff:ff:ff
24: veth117i0@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:32:98:da:78:90 brd ff:ff:ff:ff:ff:ff link-netnsid 2
26: veth118i0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:ba:5a:e5:a1:b3 brd ff:ff:ff:ff:ff:ff link-netnsid 3
28: veth141i0@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:94:0f:e8:55:01 brd ff:ff:ff:ff:ff:ff link-netnsid 4
30: veth144i0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:fc:63:7c:9f:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 5
32: veth149i0@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:3c:87:73:84:cf brd ff:ff:ff:ff:ff:ff link-netnsid 6
34: veth193i0@if33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:ce:3a:43:96:43 brd ff:ff:ff:ff:ff:ff link-netnsid 7
36: veth194i0@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:0f:28:c8:ea:cf brd ff:ff:ff:ff:ff:ff link-netnsid 8
38: veth195i0@if37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:bf:5e:ab:be:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 9
40: veth300i0@if39: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:f2:32:b4:cb:7c brd ff:ff:ff:ff:ff:ff link-netnsid 10
41: tap442i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr1 state UNKNOWN group default qlen 1000
    link/ether f2:e6:2f:61:25:d5 brd ff:ff:ff:ff:ff:ff
43: veth444i0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:d0:76:7f:5f:b0 brd ff:ff:ff:ff:ff:ff link-netnsid 11
45: veth509i0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:a1:e2:e4:cc:35 brd ff:ff:ff:ff:ff:ff link-netnsid 12
47: veth509i1@if46: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:cb:19:e0:09:42 brd ff:ff:ff:ff:ff:ff link-netnsid 12
49: veth511i0@if48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:9b:37:81:7d:ac brd ff:ff:ff:ff:ff:ff link-netnsid 13
51: veth511i1@if50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:72:8b:51:9e:b5 brd ff:ff:ff:ff:ff:ff link-netnsid 13
53: veth522i0@if52: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:2e:5e:61:2e:98 brd ff:ff:ff:ff:ff:ff link-netnsid 14
55: veth522i1@if54: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:ef:19:95:84:09 brd ff:ff:ff:ff:ff:ff link-netnsid 14
57: veth544i0@if56: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether fe:89:ef:e2:a4:81 brd ff:ff:ff:ff:ff:ff link-netnsid 15
59: veth544i1@if58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether fe:c0:d1:b1:25:79 brd ff:ff:ff:ff:ff:ff link-netnsid 15
60: tap601i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether ce:99:14:e1:f0:6f brd ff:ff:ff:ff:ff:ff

SamTzu · Oct 17, 2017

root@p1:/etc/pve/qemu-server# qm status 442
status: running

SamTzu · Oct 17, 2017

I think I may have to add something like this to the qm command:

Code:

-netdev tap,id=tap0 -device e1000,netdev=tap0

Any1 here more familiar with Qemu command line?

Alwin · Oct 17, 2017

You don't need this, PVE takes care of it and qm is a PVE command not qemu. You have two different bridges, is the connection working through either?

SamTzu · Oct 17, 2017

That is a good point. Maybe the qemu is up on the wrong side of the FW?
vmbr0 = LAN
vmbr1 = WAN

How can I test it?
I believe this (below) is for the KVM container 1000. It seems to be up on vmbr1. It should work but it is dark.

Code:

fwbr1000i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b6:93:c0:29:33:43 brd ff:ff:ff:ff:ff:ff
68: fwpr1000p0@fwln1000i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 16:38:12:e7:be:5e brd ff:ff:ff:ff:ff:ff
69: fwln1000i0@fwpr1000p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr1000i0 state UP group default qlen 1000
    link/ether b6:93:c0:29:33:43 brd ff:ff:ff:ff:ff:ff

Alwin · Oct 17, 2017

I would guess, that either your NICs switched naming (shouldn't but oh well...) or the config was changed. Also there might be a physical network issue present.

SamTzu · Oct 17, 2017

I just tested vmbr1 (WAN) by manually adding a public IP and GW. Seems to work fine.

Code:

root@p1:/etc/pve/qemu-server# ping -I vmbr1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 79.134.125.137 vmbr1: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=21.0 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=20.8 ms

If NIC's would have flipped it should also have affected LXC containers. I don't think it's that and the public NIC seems to work fine. It must be a config issue but what could be missing from the Qemu configs that is included in the LXC configs?

This is the network config:

Code:

root@p1:/etc/pve/qemu-server# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage part of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eth0 inet manual
iface eth1 inet manual

auto vmbr0
iface vmbr0 inet static
    address  10.10.12.15
    netmask  255.0.0.0
    gateway  10.0.0.1
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0

auto vmbr1
iface vmbr1 inet static
    address  172.22.12.15
    netmask  255.255.0.0
    bridge_ports eth1
    bridge_stp off
    bridge_fd 0

Alwin · Oct 17, 2017

On which network is your corosync traffic running (pvecm status)? What is your firewall setting, did anything change?

SamTzu · Oct 17, 2017

Nothing has changed. I did have to stop the cluster (as mentioned first) to get the pmxcfs -l working - so no corosync. vmbr1 = WAN so all Qemu VM's that are on vmbr1 should work with public IP. But they don't.

SamTzu · Oct 17, 2017

What is the command that Proxmox uses to start qm container? I think I saw it somewhere. It was quite long and specified a lot of things. Maybe one of those has something to do with the missing Qemu networking.

SamTzu · Oct 17, 2017

Code:

root@p1:/etc/pve/qemu-server# qm status 1001 -verbose
balloon: 8388608000
balloon_min: 4194304000
ballooninfo:
    max_mem: 8388608000
    actual: 8388608000
blockstat:
    ide2:
        wr_operations: 0
        invalid_wr_operations: 0
        wr_total_time_ns: 0
        failed_wr_operations: 0
        rd_operations: 3
        failed_flush_operations: 0
        invalid_flush_operations: 0
        failed_rd_operations: 0
        wr_highest_offset: 0
        rd_total_time_ns: 26161
        wr_bytes: 0
        flush_total_time_ns: 0
        idle_time_ns: 7360061483378
        invalid_rd_operations: 0
        timed_stats:
        wr_merged: 0
        rd_merged: 0
        flush_operations: 0
        rd_bytes: 136
    virtio0:
        rd_bytes: 42463232
        flush_operations: 2
        rd_merged: 2645
        wr_merged: 0
        timed_stats:
        invalid_rd_operations: 0
        idle_time_ns: 7268319841480
        flush_total_time_ns: 62754
        wr_bytes: 8192
        rd_total_time_ns: 6388368859
        wr_highest_offset: 4430237696
        failed_rd_operations: 0
        invalid_flush_operations: 0
        rd_operations: 5063
        failed_flush_operations: 0
        failed_wr_operations: 0
        wr_operations: 2
        invalid_wr_operations: 0
        wr_total_time_ns: 592344
    virtio1:
        failed_flush_operations: 0
        rd_operations: 235
        invalid_flush_operations: 0
        wr_operations: 0
        invalid_wr_operations: 0
        wr_total_time_ns: 0
        failed_wr_operations: 0
        wr_highest_offset: 0
        failed_rd_operations: 0
        wr_bytes: 0
        idle_time_ns: 7359973664537
        flush_total_time_ns: 0
        invalid_rd_operations: 0
        rd_total_time_ns: 256499055
        wr_merged: 0
        rd_merged: 106
        flush_operations: 0
        rd_bytes: 933888
        timed_stats:
cpus: 4
disk: 0
diskread: 43397256
diskwrite: 8192
maxdisk: 10737418240
maxmem: 8388608000
mem: 362826280
name: www3.ic4.eu
netin: 15192
netout: 0
nics:
    tap1001i0:
        netout: 0
        netin: 15192
pid: 27293
qmpstatus: running
shares: 1000
status: running
template:
uptime: 7371

SamTzu · Oct 17, 2017

I'm wondering if this is a Proxmox 4.x feature since on all v5 hosts this does not happen.

Alwin · Oct 18, 2017

SamTzu said:
Nothing has changed. I did have to stop the cluster (as mentioned first) to get the pmxcfs -l working - so no corosync. vmbr1 = WAN so all Qemu VM's that are on vmbr1 should work with public IP. But they don't.

What is in the corosync logs? This may give us a clou.

SamTzu said:
What is the command that Proxmox uses to start qm container?

'qm showcmd <vmid>', but why should it miss something all of a sudden.

SamTzu said:
name: www3.ic4.eu

If it helps, I did a tracepath to that domain, if it binds to one of your hosts.

Code:

root@pc:~$ tracepath www3.ic4.eu
 1?: [LOCALHOST]                      pmtu 1500
 1:  192.168.16.1                                          0.450ms
 1:  192.168.16.1                                          0.475ms
 2:  no reply
 3:  no reply
 4:  at-vie01b-rc1-ae31-2047.aorta.net                    45.066ms asymm  9
 5:  de-fra01b-rc1-ae1-0.aorta.net                        19.386ms asymm  8
 6:  de-fra01b-ri1-ae0-0.aorta.net                        18.311ms
 7:  decix.eunetip.net                                    21.561ms
 8:  213.192.184.201                                      39.639ms
 9:  suomicom-gw.customer.eunetip.net                     40.872ms
10:  cust-gw6.helpa.suomicom.net                          40.027ms
11:  www3.ic4.eu                                          48.953ms reached
     Resume: pmtu 1500 hops 11 back 11

SamTzu · Oct 18, 2017

Thx. I had to move the KVM clients to another host node. Too much downtime.
I guess this will remain a mystery.

I will include the corosync log here to lift the mood

Code:

root@hp3:/etc/pve/qemu-server# tail /var/log/corosync/.empty
This file is here to keep the log dir around after package removal
until #588515 is fixed.

Alwin · Oct 18, 2017

And on the remaining hosts? It has to say something, corosync would fail if the interface for the ring(s) are not available.

Search

Search

Emergency start of containers

SamTzu

Renowned Member

Alwin

Proxmox Retired Staff

SamTzu

Renowned Member

SamTzu

Renowned Member

SamTzu

Renowned Member

SamTzu

Renowned Member

Alwin

Proxmox Retired Staff

SamTzu

Renowned Member

Alwin

Proxmox Retired Staff

SamTzu

Renowned Member

Alwin

Proxmox Retired Staff

SamTzu

Renowned Member

SamTzu

Renowned Member

SamTzu

Renowned Member

SamTzu

Renowned Member

Alwin

Proxmox Retired Staff

SamTzu

Renowned Member

Alwin

Proxmox Retired Staff