Hi,
I have installed a Turnkey-fileserver (LXC) to create SMB/CIFS and NFS-shares with low resources and fast disk I/O instead of having to create a VM (also because I've read it's a bad idea to install those things directly on the host). The problem is that after a few days, I usually cannot ping the NIC anymore and only the loop-back device is left:
The error message I see is:
I've tried to search for similar posts and found this: https://forum.proxmox.com/threads/lxc-lose-sometimes-network-connection.68686/ - with answer: "I guess the DHCP lease is not renewed any more. I would set the IP in a static way." - but no: I use pfSense (virtualized on the Proxmox server) and all other devices get their DHCP lease renewed. So I don't think the problem lies with the DHCP-server... I can bring up (consistently) the network again using (tedious, requires me to login to the LXC and tedious because this shouldn't be necessary every now and then a few days, it should always be up like my other devices):
BEFORE - /etc/network/interfaces (on the LXC fileserver):
AFTER - /etc/network/interfaces (on the LXC fileserver):
Yes, I know I actually didn't remove the top line even though I modified the file. But why does it seem to help or make a difference with "allow-hotplug"? I didn't read this in any guides, but the ask-ubuntu-site (link above) does discuss this, but maybe I just don't understand it. Is this typically recommended, i.e. to use "allow-hotplug" for LXC or does it have anything to do with eth0 being a virtual bridge (if so I haven't seen that recommendation)?
Finally a few extra details: My system is a small lowpower hp t730, running pfsense. The NIC is configured to simply be a vmbr0 "Linux Bridge" interface, which in the LXC-"Network tab is setup as a network bridge with IP address "dhcp". That same "Linux Bridge"-interface is also made available to my pfSense-VM which has that vmbr0-bridge configured in the "Hardware"-tab, where it's a network device with "no VLAN", model = "VirtIO (paravirtualized)" and with unchecked boxes at "Firewall", "Disconnect" and nothing at "Rate limit" (=unlimit.) and multiqueue is empty. I think the issue lies with the LXC, although the bridge is also used in pfSense - i.e. I'm just telling this to let you know that the NIC is not a physical NIC...
Anyone has any ideas? I would be grateful, if I could know for sure that this is in fact the (recommended) solution (and it would be nice to understand why it works also), thanks!
I have installed a Turnkey-fileserver (LXC) to create SMB/CIFS and NFS-shares with low resources and fast disk I/O instead of having to create a VM (also because I've read it's a bad idea to install those things directly on the host). The problem is that after a few days, I usually cannot ping the NIC anymore and only the loop-back device is left:
Code:
root@turnkey-fileserver ~# ip -4 a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
The error message I see is:
Code:
root@turnkey-fileserver ~# systemctl status networking
* networking.service - Raise network interfaces
Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2022-05-28 20:14:42 CEST; 2 days ago
Docs: man:interfaces(5)
Process: 77 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
Main PID: 77 (code=exited, status=1/FAILURE)
CPU: 118ms
May 28 20:14:32 turnkey-fileserver ifup[77]: udhcpc: sending discover
May 28 20:14:35 turnkey-fileserver ifup[77]: udhcpc: sending discover
May 28 20:14:38 turnkey-fileserver ifup[77]: udhcpc: sending discover
May 28 20:14:42 turnkey-fileserver ifup[77]: /etc/udhcpc/default.script: Lease failed:
May 28 20:14:42 turnkey-fileserver ifup[77]: udhcpc: no lease, failing
May 28 20:14:42 turnkey-fileserver ifup[77]: ifup: failed to bring up eth0
May 28 20:14:42 turnkey-fileserver systemd[1]: networking.service: Main process exited, code=exited, sta
May 28 20:14:42 turnkey-fileserver systemd[1]: networking.service: Failed with result 'exit-code'.
May 28 20:14:42 turnkey-fileserver systemd[1]: Failed to start Raise network interfaces.
May 28 20:14:42 turnkey-fileserver systemd[1]: networking.service: Consumed 118ms CPU time.
I've tried to search for similar posts and found this: https://forum.proxmox.com/threads/lxc-lose-sometimes-network-connection.68686/ - with answer: "I guess the DHCP lease is not renewed any more. I would set the IP in a static way." - but no: I use pfSense (virtualized on the Proxmox server) and all other devices get their DHCP lease renewed. So I don't think the problem lies with the DHCP-server... I can bring up (consistently) the network again using (tedious, requires me to login to the LXC and tedious because this shouldn't be necessary every now and then a few days, it should always be up like my other devices):
Code:
root@turnkey-fileserver ~# systemctl restart networking
root@turnkey-fileserver ~# systemctl status networking
* networking.service - Raise network interfaces
Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
Active: active (exited) since Mon 2022-05-30 22:09:23 CEST; 5s ago
Docs: man:interfaces(5)
Process: 2646 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
Main PID: 2646 (code=exited, status=0/SUCCESS)
Tasks: 1 (limit: 17848)
Memory: 340.0K
CPU: 256ms
CGroup: /system.slice/networking.service
`-2684 /sbin/udhcpc -n -p /run/udhcpc.eth0.pid -i eth0
May 30 22:09:23 turnkey-fileserver systemd[1]: Starting Raise network interfaces...
May 30 22:09:23 turnkey-fileserver ifup[2646]: udhcpc: started, v1.30.1
May 30 22:09:23 turnkey-fileserver ifup[2646]: udhcpc: sending discover
May 30 22:09:23 turnkey-fileserver ifup[2646]: udhcpc: sending select for 192.168.100.10
May 30 22:09:23 turnkey-fileserver ifup[2646]: udhcpc: lease of 192.168.100.10 obtained, lease time 7200
May 30 22:09:23 turnkey-fileserver ifup[2646]: /etc/udhcpc/default.script: Resetting default routes
May 30 22:09:23 turnkey-fileserver ifup[2646]: SIOCDELRT: No such process
May 30 22:09:23 turnkey-fileserver ifup[2646]: /etc/udhcpc/default.script: Adding DNS 192.168.100.1
May 30 22:09:23 turnkey-fileserver ifup[2646]: /etc/resolvconf/update.d/libc: Warning: /etc/resolv.conf
May 30 22:09:23 turnkey-fileserver systemd[1]: Started Raise network interfaces.
root@turnkey-fileserver ~# ip -4 a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link-netnsid 0
inet 192.168.100.10/24 brd 192.168.100.255 scope global eth0
valid_lft forever preferred_lft forever
I've also found this thread and answer: https://askubuntu.com/a/1026911 - which made me directly modify the /etc/network/interfaces file such that instead of "auto eth0" I have "allow-hotplug eth0" (followed by "iface eth0 inet dhcp"). This really seems to bring back network stability! Let me elaborate, because this is really the question I want to ask (the above is just the context):BEFORE - /etc/network/interfaces (on the LXC fileserver):
Code:
# UNCONFIGURED INTERFACES
# remove the above line if you edit this file
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet dhcp
AFTER - /etc/network/interfaces (on the LXC fileserver):
Code:
# UNCONFIGURED INTERFACES
# remove the above line if you edit this file
auto lo
iface lo inet loopback
# "auto eth0" was replaced by "allow-hotplug eth0" but
# it automatically inserts "auto eth0" after reboot:
# (attempt to avoid that occasionally eth0 disappears)
allow-hotplug eth0
auto eth0
iface eth0 inet dhcp
Yes, I know I actually didn't remove the top line even though I modified the file. But why does it seem to help or make a difference with "allow-hotplug"? I didn't read this in any guides, but the ask-ubuntu-site (link above) does discuss this, but maybe I just don't understand it. Is this typically recommended, i.e. to use "allow-hotplug" for LXC or does it have anything to do with eth0 being a virtual bridge (if so I haven't seen that recommendation)?
Finally a few extra details: My system is a small lowpower hp t730, running pfsense. The NIC is configured to simply be a vmbr0 "Linux Bridge" interface, which in the LXC-"Network tab is setup as a network bridge with IP address "dhcp". That same "Linux Bridge"-interface is also made available to my pfSense-VM which has that vmbr0-bridge configured in the "Hardware"-tab, where it's a network device with "no VLAN", model = "VirtIO (paravirtualized)" and with unchecked boxes at "Firewall", "Disconnect" and nothing at "Rate limit" (=unlimit.) and multiqueue is empty. I think the issue lies with the LXC, although the bridge is also used in pfSense - i.e. I'm just telling this to let you know that the NIC is not a physical NIC...
Anyone has any ideas? I would be grateful, if I could know for sure that this is in fact the (recommended) solution (and it would be nice to understand why it works also), thanks!