frozen lxc container or standstill voice (asterisk)

Denis Kulikov

Member
Feb 28, 2018
26
2
23
42
Hi all!

"A long time ago" we try to migrate our small office PBX (Asterisk 11 + CentOS6) to Proxmox (LXC container) and discovered problem with voice that called standstill (one way audio).
We have up to 7 concurent SIP calls (maximum 1 Mbps of voice traffic).
Only one container running on node, LA near 1.
We try to create test cluster for smooth migration of container and quick change hardware (3 different servers, from "High": 2xX5650,128GB RAM to "Low": 2xX5250, 16GB RAM), only one container running on the node at the same time and problem persist anywhere.
We try to capture traffic in 4 places and simultaneously ping each place with 100ms interval: host, container, switch (SPAN), ISP and find that container freeze for 1-1,5 seconds.

Then problem occur:
tcpdump on host showed icmp ping replies from container(!) to PC and one way RTP traffic (from ISP to container, but not on reverse direction).
tcpdump on container showing nothing (but ping replies exist on bridge|wire!), 0(zero) packets was captured in 1-1,5 sec interval - as if the container was freeze(hung), but packets existed in capture after and before this "one-way voice standstill".

In general, it looks as if the container is frozen for 1 - 1,5 sec.
At first we thought it was a problem with network bridge or veeth, but container replies to ping and not showing this on tcpdump (nothing showed then problem exist).

In dmesg, syslog, etc - no suspicious messages (such as hung, timeout, etc) - all fine everywhere.
For example from "Low" server:

Code:
top - 11:41:09 up 15 days,  3:12,  2 users,  load average: 0,61, 0,63, 0,52
Tasks: 327 total,   1 running, 266 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,9 us,  1,8 sy,  0,0 ni, 96,1 id,  1,2 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem : 12296532 total,  4123032 free,  5691732 used,  2481768 buff/cache
KiB Swap:  8388604 total,  8369916 free,    18688 used.  6074492 avail Mem

Code:
pveversion -v

proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-10
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
pve-zsync: 1.7-1
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1

Kernel, lxc, etc updates, tuning, reading and hardware changes during year doesn`t solve the problem and we ask community to help us, please - how to find the root cause the problem (veeth, lxc, zfs....)?
 
if i understand your post correctly, the freeze happens when you try to migrate the container?
if yes this is normal, there is no real live-migration with containers, only 'restart' mode so the containers stops, config gets transferred to the new node and started again

if you need a live-migration, you need to use vms instead of containers
 
Many thanks for answer.
No, we use migration only for move container to different hardware (in off state) for tests.
'Freeze' problem persists on all nodes (on lab, on production).
After analyze tcpdumps we seen very strange network problems inside hosts (tcpdump on external interface of host with 2-3 bridge|vlans and 1 container) - RTP packet reordering, etc and we can't imagine what is the reason for this (tc with default settings - cannot, drops and flushes not exist on ifconfig|ethtool statistics, dmesg and kernel.log - nothing interesting)....
We try to do some network tuning in kernel (backlog, etc) and wait for results, but on up to 2 Mbps of voice traffic and some control traffic (up to 10 Mbps with bursts) - it is not required (i think).
 
Last edited:
Hi all.

We found some interesting thing, that container freeze for 1.2 sec then network activity from corosync started (unicast udp 5404->5405 between cluster members) and unfreeze after them ended (and rtp traffic burst detected).
Any help would be appreciated.
 
Hi!
Hi all.

We found some interesting thing, that container freeze for 1.2 sec then network activity from corosync started (unicast udp 5404->5405 between cluster members) and unfreeze after them ended (and rtp traffic burst detected).
Any help would be appreciated.

You need to use a separate physical network interface for the cluster (corosync) needs.
Like this:

pic_0.png

Best regards
Gosha
 
Last edited:
Many thanks for answer, gosha.

You need to use a separate physical network interface for the cluster needs.

Then internal traffic in 3 nodes cluster (without shared storage and replication) with 2 containers up to 10 Mbps (on Gigabit network) it`is not necessary, i think (we observe this issue in 2 nodes cluster with 1 container).
Or you can describe why this network setup is necessary?
 
"container freeze for 1.2 sec then network activity from corosync started" - is this a bad argument? ;)

It`s not an argument for network engineer, activity from corosync (bidirectional!) is approx (without DSCP and CoS values):
400packets*162bytes = 64800 bytes in 120 ms.
For 1 second rate is 5.2 Mbit/s.
And additional network interface needed for 5Mbit/s traffic?

We will try to setup dedicated nic for cluster activity, but i think that this not help.
 
It`s not an argument for network engineer, activity from corosync (bidirectional!) is approx (without DSCP and CoS values):
400packets*162bytes = 64800 bytes in 120 ms.
For 1 second rate is 5.2 Mbit/s.
And additional network interface needed for 5Mbit/s traffic?

We will try to setup dedicated nic for cluster activity, but i think that this not help.

Ok. Please read first paragraph here: https://pve.proxmox.com/wiki/Separate_Cluster_Network

Gosha
 
We separate the cluster network (move to dedicated nic), cluster is fine, problem (issue) is persist and nothing was changed as expected.
We will try to profile the components by strace and perf, debug corosync, lxcfs(fuse) and update glibc in container.
 
I think that we found the root cause of this issue.
It`s lxc-freeze command, that issued by pvesr and it work as expected - container is freezing.
I don't have enough time now to find the reason of this and need to do some tests (replication is off - as my colleagues said), and read the docs about replication/pvesr/etc stuff.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!