unregister_netdevice error

Giganic · Aug 19, 2016

Hi All,

Been enjoying using Proxmox for the last couple of months with no real complaints, really appreciate the work that has gone into overhauling the web interface, slick.

Over the past couple of days I have had quite a bit of interaction with my server and have noticed the below error coming up which seems to be triggered by a container starting up after being shut down.

Code:

unregister_netdevice: waiting for lo to become free. Usage count = 1

It repeats several times before stopping or I manage to interrupt it long enough to trigger a reboot, which makes the problem go away for a short period of time, but it eventually returns. During this time the web interface is unresponsive and any tasks which were triggered timeout.

Doing some research on this error brings up a result from these very forums back in 2010 and other posts relating to docker etc.

As the problem is occurring quite frequently and without knowing how to fix said issue I am reaching out for support in the hope someone will be able to assist me. Please let me know if you need further information.

Thanks in advance.

pveversion -v results:

Code:

proxmox-ve: 4.2-60 (running kernel: 4.4.15-1-pve)
pve-manager: 4.2-17 (running version: 4.2-17/e1400248)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve2
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-43
qemu-server: 4.0-85
pve-firmware: 1.1-8
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-56
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6-1
pve-container: 1.0-72
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.3-4
lxcfs: 2.0.2-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5.7-pve10~bpo80

ohmer · Aug 24, 2016

Hello,

I have the same problem when I shutdown/stop then I start a container where I mounted a cifs share. The others containers continues to work, but the pct command hangs (even for a pct list). The web UI work but actions on containers just hang too. After 150-300 secondes (delay is not fixed, seem like variable within this range), the problem disappears by itself and everything continue to work right again.

If I manually unmount the share before a shutdown or a stop, the problem doesn't manifest.

Giganic · Aug 24, 2016

Interesting. Thanks for adding your experience @ohmer. As I said in my first update once I removed that troublesome container the problem seemed to go away and it was only today when it came back however it was only for a very small amount of time and it probably produced the error 3 times in 20 seconds before going away.

I echo your comments on the WebUI, once that error start occurring it is either wait or try and get it to reboot.

Hopefully someone with a better idea can assist us and anyone else who might have encountered this error.

Giganic · Sep 5, 2016

As this problem doesn't seem to be getting any type of feedback it must mean myself and @ohmer are the only ones with an issue. But regardless I decided to go ahead and record a video of the problem in action to hopefully assist with resolving this problem.

It isn't isolated to my server either as you can see @ohmer has reported a similar experience to me. I have mainly noticed mine on startup. If someone from Proxmox would be able to chime in and assist or even ask for more information / logs to be able to help I would be more than happy to oblige and provide whatever is necessary.

As the problem is occurring quite regularly it's becoming frustrating having it occur without an answer as to why or more importantly how to stop it from occurring in the future.

Video

ohmer · Sep 6, 2016

I switched to KVM. LXC is not fully ready to use in Proxmox I think, especially with ext4 filesystem. I wanted to use LXC container because of performances but in fact performances are currently better with KVM.

Giganic · Sep 9, 2016

There are multiple reports on the web about this.

1. https://github.com/docker/docker/issues/5618
2. https://bugzilla.redhat.com/show_bug.cgi?id=880394#c13
3. https://bugzilla.kernel.org/show_bug.cgi?id=81211
4. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152

So this isn't isolated to Proxmox, but with this thread receiving absolutely no response from the guys at Proxmox I don't know what the answer is...

Giganic · Sep 29, 2016

Anyone? @wolfgang, @tom, @dietmar, @martin, @fabian. Surely @ohmer and myself aren't the only ones facing this issue.

If I could bring your attention to this post https://forum.proxmox.com/threads/unable-to-stop-a-container-waiting-for-lo-to-become-free.3510/

Created back in 2010 from @Smanux where @dietmar posted it was a bug (no solution) in the four years since you would image something has been done to rectify this problem?

It would be appreciated for someone to respond to this post (which is merely asking for support) with your product to not only assist @ohmer and myself but also anyone else who might be facing the same issue but don't wish to create a forum account.

fabian · Sep 29, 2016

I'd recommend upgrading to the current version, and posting a clear and detailed description of your setup and the exact problem here (bugs that are not obviously reproducible tend to be hard to track). Things you should probably include:

"pveversion -v"
host network configuration
container configuration file
container setup (i.e., what is running inside the container?)
system logs from container start to stop
describe the exact steps you do when the issue occurs - for example, how do you shutdown the container?

also it might be worth a try to try to reproduce the problem with LXC debug logging:

run "lxc-start -n XXX -F -l debug -o /tmp/lxc-XXX.log" (replace XXX with the container ID)
wait for the boot to finish
in a second shell/ssh session/... (lxc-start stays in the foreground), run "pct shutdown XXX" and wait for the container to shut down
after the container has shut down (or if you have waited for more than 5 minutes, when you have terminated the container by pressing Ctrl+C in the session where you run lxc-start), attach the log file here or in a new bug report containing all of the above information

Giganic · Sep 29, 2016

Thank you for your reply @fabian. I am now better able to provide you the information you require to further assist with this particular problem. If there is anything I have missed or incorrect or would like further information on, please don't hesitate to let me know.

1. Output of pveversion -v. I updated to 4.3 last night.

Code:

proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
pve-manager: 4.3-1 (running version: 4.3-1/e7cdc165)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-88
pve-firmware: 1.1-9
libpve-common-perl: 4.0-73
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-61
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-6
pve-container: 1.0-75
pve-firewall: 2.0-29
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80

2. Host Network Configuration

Code:

auto lo
iface lo inet loopback

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.50
        netmask 255.255.255.0
        gateway 192.168.1.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0

3. Container Configuration File

Code:

arch: amd64
cpulimit: 2
cpuunits: 1024
hostname: container
memory: 1024
net0: name=eth0,bridge=vmbr0,gw=192.168.1.1,hwaddr=92:7E:82:87:C2:FF,ip=192.168.1.83/24,type=veth
ostype: debian
rootfs: tank:subvol-114-disk-1,size=50G
swap: 512
lxc.aa_profile: unconfined

4. Container Setup

A barebones CT with Debian 8 running a single instanace of Flexget. A samba share is also setup in /etc/fstab.

Code:

samba
cifs-utils
python-pip
virutalenv
flexget

5. Syslog from Container Start and Stop: http://hastebin.com/uvakubofod.vbs

6. Description Steps:

It is reproducible by either starting or stopping the container though the web interface. After stopping the error will often continue before randomly stopping and if it occurs on startup it will halt the container from starting.

7. LXC Debug Log: http://hastebin.com/tanitameyu.sql

fabian · Sep 29, 2016

@Giganic: does the problem also go away for you if you unmount the samba share before shutting the container down?

Giganic · Sep 29, 2016

@fabian: I tried your suggestion.

1. Start CT.
2. Verify share is mounted.
3. Unmount share.
4. Verify share is unmounted.
5. Shutdown

No issues as @ohmer noted. I thought the issue might still occur on startup but it seems once it has shutdown with the share unmounted it boots without error and did so on 4 occasions.

I then went back and performed a normal shutdown without unmounting and shortly after the CT was stopped the error returned, seems pretty reproducible too.

fabian · Sep 29, 2016

I updated bug 1066 with a link to this thread, let's see if we can reproduce this here as well: https://bugzilla.proxmox.com/show_bug.cgi?id=1066

Giganic · Sep 29, 2016

Great, thanks @fabian. Out of all the machines I have, this one with samba is the only one with an issue, wouldn't be so bad if it didn't cause everything to freeze.

Hopefully armed with the information you now have, it can be isolated and fixed. If there is anything else you require from me, please let me know.

B4c4rd1 · Feb 2, 2017

I have the same issue.

Only with CIFS on LXC Container.

You can reproduce with LXC -> Debian 8 and mount a CIFS source with fstab and reboot.

Code:

Message from syslogd@pve-02 at Feb  2 23:11:58 ...
kernel:[ 1943.132020] unregister_netdevice: waiting for lo to become free. Usage count = 3

Message from syslogd@pve-02 at Feb  2 23:12:08 ...
kernel:[ 1953.371053] unregister_netdevice: waiting for lo to become free. Usage count = 3

EDIT: More test with CIFS Share.... ~~reboots are not the Problems. Only shutdowns.~~

I did some tests. Only when CIFS is not dismounted does the error occur.

Without CIFS mount over fstab / 5 times shutdown and start = OK
With CIFS mount over fstab / 5 times shutdown and start = fail
With CIFS mount over fstab / 5 times shutdown and start and umount before shutdown = OK

I have written a systemd script that removes the CIFS drive before reboot or shutdown.

On: /etc/systemd/system create the file:
start_and_stop.service

Code:

[Unit]
Description=/etc/rc.local.shutdown Compatibility
Before=shutdown.target reboot.target
[Service]
ExecStart=/bin/true
ExecStop=/opt/enerspace/service/umountCifs.sh
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

Now register with systemd:
systemctl enable start_and_stop.service

Thanks to @ohmer for the tip, so I could find a solution.

hoppel118 · Mar 3, 2017

Hello guys,

I have this exact same error when I shutdown an lxc while a cifs is mounted.

@B4c4rd1 Can you explain your script a little bit more please? The path "/opt/enerspace/service/umountCifs.sh" doesn't exit for me, not at the host and not at the lxc. Wehre should your script be placed? At the host or at the container? What are correct right for this file and who should own it?

Thanks a lot Hoppel

Search

Search

unregister_netdevice error

Giganic

Active Member

ohmer

New Member

Giganic

Active Member

Giganic

Active Member

ohmer

New Member

Giganic

Active Member

Giganic

Active Member

fabian

Proxmox Staff Member

Giganic

Active Member

fabian

Proxmox Staff Member

Giganic

Active Member

fabian

Proxmox Staff Member

Giganic

Active Member

B4c4rd1

Renowned Member

hoppel118

Well-Known Member