tap devices & vm's crashing

taenzerme · Jun 6, 2016

Hello all,

after a recent upgrade to latest 4.2 we notice random vm crashes (kvm, debian8). The log files show these errors:

Code:

Jun 06 16:45:12 vmhost3 kernel: vmbr0: port 13(tap511i0) entered disabled state
Jun 06 16:45:12 vmhost3 ovs-vsctl[166886]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap511i0
Jun 06 16:45:12 vmhost3 kernel: vmbr0: port 9(tap505i0) entered disabled state
Jun 06 16:45:12 vmhost3 ovs-vsctl[166886]: ovs|00002|db_ctl_base|ERR|no port named tap511i0
Jun 06 16:45:12 vmhost3 ovs-vsctl[166887]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap505i0
Jun 06 16:45:12 vmhost3 kernel: vmbr0: port 2(tap114i0) entered disabled state
Jun 06 16:45:12 vmhost3 ovs-vsctl[166887]: ovs|00002|db_ctl_base|ERR|no port named tap505i0
Jun 06 16:45:12 vmhost3 ovs-vsctl[166890]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap114i0
Jun 06 16:45:12 vmhost3 ovs-vsctl[166890]: ovs|00002|db_ctl_base|ERR|no port named tap114i0
Jun 06 16:45:13 vmhost3 ntpd[1317]: Deleting interface #27 tap511i0, fe80::28af:80ff:fe79:d85e#123, interface stats: received=0, sent=0, dropped=0, active_time=92917 secs
Jun 06 16:45:13 vmhost3 ntpd[1317]: Deleting interface #24 tap505i0, fe80::c4b3:86ff:feec:29bd#123, interface stats: received=0, sent=0, dropped=0, active_time=93018 secs
Jun 06 16:45:13 vmhost3 ntpd[1317]: Deleting interface #22 tap114i0, fe80::5820:1eff:fe49:d666#123, interface stats: received=0, sent=0, dropped=0, active_time=93293 secs
Jun 06 16:45:13 vmhost3 ntpd[1317]: peers refreshed

Code:

Jun 06 16:44:16 vmhost2 kernel: vmbr0: port 7(tap105i0) entered disabled state
Jun 06 16:44:16 vmhost2 ovs-vsctl[230911]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i0
Jun 06 16:44:16 vmhost2 ovs-vsctl[230911]: ovs|00002|db_ctl_base|ERR|no port named tap105i0

Code:

Jun 06 16:43:17 vmhost1 kernel: vmbr0: port 10(tap109i0) entered disabled state
Jun 06 16:43:17 vmhost1 ovs-vsctl[226466]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap109i0
Jun 06 16:43:17 vmhost1 ovs-vsctl[226466]: ovs|00002|db_ctl_base|ERR|no port named tap109i0

This happens on all of our vm hosts. Any hints where or what to look for?
Could a defect hardware switch be causing this?

Best
Sebastian

manu · Jun 7, 2016

Hi
I don't have ovs bridge in my test lab, only linux bridge, but it looks to me this message only means the tap device of the VM109 stopped whihc is something to expect after the KVM processes existed.
What do you eman when a VM is crashed ?
IS the Qemu process still there with
(106 here is the ID of your VM)
ps axl | grep 'id 106'

taenzerme · Jun 7, 2016

Manu, thanks for the quick follow-up.

Several VM's on different hosts were stopped randomly - and I really mean randomly on different days in the last weeks. The VM's did not seem to have crashed but were in a stopped state. I had to start all of them manually and all of them came back online then after the normal boot process.

The strange thing is: We weren't using the ovs bridges at that point. After the stops started happening last week some time we removed the ovs bridges and recreated them with regular linux bridges. The stops yesterday were using the linux bridge. The log entries happened exactly at the time of the stops.

/etc/network/interfaces looks the same on all nodes:

Code:

auto lo

iface lo inet loopback


iface eth0 inet manual


iface eth3 inet manual


iface eth5 inet manual


iface eth2 inet manual


iface eth1 inet manual


iface eth4 inet manual


auto vmbr1

iface vmbr1 inet manual

bridge_ports eth1

bridge_stp off

bridge_fd 0


auto vmbr0

iface vmbr0 inet static

address  192.168.100.200

netmask  255.255.255.0

gateway  192.168.100.244

bridge_ports eth4

bridge_stp off

bridge_fd 0

bridge_vlan_aware yes

I did the latest updates yesterday and rebooted all nodes with the new kernel. No stops so far, will keep watching.

Best,
Seb

manu · Jun 7, 2016

Hi Seb
If a KVM machine stops again, please provide the ouput of the ps command shown above, and the 'dmesg' command both executed on the host system.

taenzerme · Jun 7, 2016

Manu, yes of course, will do. As soon as it happens again - but maybe the kernel upgrade was needed.

Search

Search

tap devices & vm's crashing

taenzerme

Renowned Member

manu

Proxmox Staff Member

taenzerme

Renowned Member

manu

Proxmox Staff Member

taenzerme

Renowned Member

We value your privacy