Proxmox 4.0 lxc container's network unstable

rarirureluis

Renowned Member
Jul 14, 2015
9
0
66
When I try connecting to lxc container with SSH, it often can't connect.
and cant download any packages because network is very unstable.

VM is all good.
I use official template "centos-7-default".

Proxmox's interfaces
------------------------------------------------------------------------------------
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
bond-master bond0

auto eth1
iface eth1 inet manual
bond-master bond0

auto bond0
iface bond0 inet manual
bond-slaves none
bond-miimon 100
bond-mode 1

auto vmbr0
iface vmbr0 inet static
bridge_ports bond0

address 192.168.20.100
network 192.168.20.0
netmask 255.255.255.0
broadcast 192.168.20.255
gateway 192.168.20.1
dns-nameservers x
dns-nameservers x

pre-up ifup bond0
post-down ifdown bond0

What shoud i do?
 
Same issue here. KVM is very stable, but LXC containers randomly loses network. Even when I try to login to the container from the host it won't work.

The only way I can resolve it is by killing the containers and rebooting the host.

Im using:

root@node1 ~ # pveversion
pve-manager/4.1-22/aca130cf (running kernel: 4.2.8-1-pve)
 
Same issue here. KVM is very stable, but LXC containers randomly loses network. Even when I try to login to the container from the host it won't work.

The only way I can resolve it is by killing the containers and rebooting the host.

Im using:

root@node1 ~ # pveversion
pve-manager/4.1-22/aca130cf (running kernel: 4.2.8-1-pve)

please send the full output of:

> pveversion -v
 
pveversion -v
proxmox-ve: 4.1-39 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-15 (running version: 4.1-15/8cd55b52)
pve-kernel-4.2.8-1-pve: 4.2.8-39
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-33
qemu-server: 4.0-62
pve-firmware: 1.1-7
libpve-common-perl: 4.0-49
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-42
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-8
pve-container: 1.0-46
pve-firewall: 2.0-18
pve-ha-manager: 1.0-23
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
 
I'm trying the LXC for a month now. We have 2 nodes with proper hardware. The KVM machines are extremely stable. Never one issue. It is only the LXC containers that run for a couple of days and then suddenly without any reasons are unreachable.

When the LXC containers are unreachable the KVM machines are still accessible and working fine.

If I try to connect to a LXC container from within the host machine it times out. Also when trying to shutdown the container it wont shutdown.

I can reproduce this issue on both nodes, so hardware issues are eliminated. The LXC containers are build up using the Centos template installed inside the Proxmox dashboard.

Do you need any more details? Please let me know!
 
it might help to get system logs ("journalctl -b" and "dmesg" on the host) and information from within the container ("pct enter <ID>" gives you the possibility to run commands even if network access is not possible), like ifconfig, logs, routing information, ..
 
I have attached the requested data.

For your information. When the container is unreachable I also cannot:
- run the following command on the host: pct enter CTID

The container console in the Proxmox dashboard is responding and showing the Centos login screen, but after typing the password it also hangs.
 

Attachments

  • dmesg_host.txt
    89.4 KB · Views: 3
  • journalctl_host.txt
    181.8 KB · Views: 1
  • journalctl_ct.txt
    113.9 KB · Views: 2
  • dmesg_ct.txt
    89.5 KB · Views: 3
  • Could you also post the output of "ps faxl" on the host after the issue occurs?
You are not running uptodate packages, there was a bug in lxcfs <= 2.0.0-pve1 where processes in an LXC container could hang in kernel space while accessing certain files in /proc. If you are running some kind of monitoring software inside the container this might be triggered.
  • Can upgrade ("apt get update; apt get full-upgrade") and see if the problem persists?
  • Does "pct enter <ID>" show some kind of error message?
  • What about "lxc-attach -n <ID>" ?
  • Your journal output from within the container shows some errors which are probably unrelated, but might be worth investigating (a failing rc.local script, dovecot configuration file issues, "Failed to kill control group: Invalid argument")
 
  • Like
Reactions: bvbmedia
Thanks a lot. Great to hear there is a chance that updating the packages could resolve this issue. I will also investigate the other errors.

I have attached ps_faxl_host.txt
 

Attachments

  • ps_faxl_host.txt
    58.2 KB · Views: 3
For your information we use ISPConfig on the containers. This control panel also provides monitoring features. Maybe it has something to do with that?
 
I have updated the packages and will let you know the outcome:

pveversion -v
proxmox-ve: 4.1-39 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-22 (running version: 4.1-22/aca130cf)
pve-kernel-4.2.8-1-pve: 4.2.8-39
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-36
qemu-server: 4.0-64
pve-firmware: 1.1-7
libpve-common-perl: 4.0-54
libpve-access-control: 4.0-13
libpve-storage-perl: 4.0-45
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-52
pve-firewall: 2.0-22
pve-ha-manager: 1.0-25
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
 
Last edited:
Hi,

The container machines now throwing a lot of ext4 corruption errors. Should I have restarted the host after running the updated packages? It did not say anything about it after installing the updates.

The errors I see on the containers are like this:

[85248.459721] EXT4-fs error: 16 callbacks suppressed
[85248.459759] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85248.659947] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85248.860096] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85249.060309] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85249.260479] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85249.460659] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85249.660835] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85249.861024] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85250.061212] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85250.261402] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85251.981463] loop: Write error at byte offset 10216607744, length 4096.
[85251.981503] blk_update_request: I/O error, dev loop1, sector 19954312
[85251.981541] EXT4-fs warning (device loop1): ext4_end_bio:332: I/O error -5 writing to inode 1469689 (offset 0 size 0 starting block 2494290)
[85251.981610] Buffer I/O error on device loop1, logical block 2494289
[85253.662776] EXT4-fs error: 16 callbacks suppressed
[85253.662814] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85253.862977] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85254.063161] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85254.263347] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85254.463521] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85254.663691] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85254.863880] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85255.064083] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85255.264258] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85255.464432] EXT4-fs error (device loop1): ext4_find_dest_de:1809: inode #149734: block 536000: comm httpd: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[85256.976757] loop: Write error at byte offset 27880931328, length 4096.
[85256.976798] blk_update_request: I/O error, dev loop1, sector 54454944
[85256.976836] EXT4-fs warning (device loop1): ext4_end_bio:332: I/O error -5 writing to inode 938837 (offset 0 size 0 starting block 6806869)
[85256.976905] Buffer I/O error on device loop1, logical block 6806868
[85256.976954] loop: Write error at byte offset 10210635776, length 4096.
[85256.976990] blk_update_request: I/O error, dev loop1, sector 19942648
[85256.977027] EXT4-fs warning (device loop1): ext4_end_bio:332: I/O error -5 writing to inode 933241 (offset 0 size 0 starting block 2492832)
[85256.977095] Buffer I/O error on device loop1, logical block 2492831
 
You should do a filesystem check on the affected disks, but those errors are unrelated to the package updates you installed. Restarting the host is only necessary when upgrading the kernel or for security updates in Debian Jessie to stuff like libc.
 
Good to hear. Yes I have runned the fsck on the corrupted images (running fsck on raw disk image from the host level), but it looks like it cannot repair 100%. It ends up with the message that the partition still has errors.

fsck /mnt/vm-100-disk-1/images/110/vm-110-disk-1.raw

What command would you suggest me to run?
 
Last edited:
If you encounter errors, please post the complete output here. Note that you should run fsck on the unmounted disk, and that not all errors are recoverable.
 
The updates of the packages fixed the stability issues! We are up without any troubles for several days now.

The FSCK helped but it could not fix everything (as you mentioned). Some databases were messed up. I restored some tables from the backup to fix the problems permanently. No data-loss.

I will keep monitoring and let you know at the end of the week if all is still fine.

Thanks for your quick support!
 
Unfortunately after running a couple of days without any troubles this morning I have the same issue again.

Trying SSH to the containers but not responding. Also some sites on the containers are not responding. Load of the host itself is very minimal.
 

Attachments

  • Screenshot - 31-3-2016 , 09_10_21.png
    Screenshot - 31-3-2016 , 09_10_21.png
    79.9 KB · Views: 6
  • dmesg_host.txt
    2.4 KB · Views: 2
  • journal_host.zip
    85.7 KB · Views: 1
  • dmesg_host.txt
    2.4 KB · Views: 1
  • Screenshot - 31-3-2016 , 09_16_58.png
    Screenshot - 31-3-2016 , 09_16_58.png
    16.3 KB · Views: 5
Last edited:
Also tried:
pct enter 108

No response.

When I try:

lxc-attach -n 108

No response either.

Any suggestions or you need more info?
 
I believe it is related to cronjob entries that are kept running on the host.

When I run:
ps aux|grep cron >/root/ps_cron.txt

I count 5006 entries of:

root 311 0.0 0.0 68324 3724 ? S 02:31 0:00 /usr/sbin/crond -n
root 316 0.0 0.0 68324 3692 ? S 05:16 0:00 /usr/sbin/crond -n
 

Attachments

  • ps_cron.txt
    410.7 KB · Views: 3
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!