Container Unable to Start

Sakamoto

New Member
Oct 25, 2024
13
0
1
Hi Guys, I need some help here. My container is unable to start after I turn on my cluster. This is what I get when I try to turn it on.

1740370433822.png

I have tried to reboot the cluster and restore the backup but still get the same error, Any idea?
 
Hi,

I do not see a clear indication, what caused the error. Try starting the container with pct start 108 --debug to get more log output.
If you would like to describe the environment a bit more, this might help too.
 
Here it is

Code:
root@prmx:~# pct start 108 --debug
run_buffer: 571 Script exited with status 11
lxc_init: 845 Failed to run lxc.hook.pre-start for container "108"
__lxc_start: 2034 Failed to initialize container "108"
0 hostid 100000 range 65536
INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "108", config section "lxc"
DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 108 lxc pre-start produced output: org.freedesktop.DBus.Error.ServiceUnknown: The name uk.org.thekelleys.dnsmasq.Internal was not provided by any .service files

ERROR    utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 11
ERROR    start - ../src/lxc/start.c:lxc_init:845 - Failed to run lxc.hook.pre-start for container "108"
ERROR    start - ../src/lxc/start.c:__lxc_start:2034 - Failed to initialize container "108"
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "108", config section "lxc"
startup for container '108' failed


Our office had a power outage last weekend. Then we bulk shut down the VM and shut down the host. after I turned on the server back, the container could not turn on but VM could be turned on as usual.

My server has 2 cluster nodes and both clusters use separate storage.
 
Yes, and it looks like no changes.

Code:
root@prmx:/etc/pve/sdn# cat subnets.cfg
subnet: Internal-192.168.201.0-24
        vnet Internal
        dhcp-range start-address=192.168.201.2,end-address=192.168.201.250
        gateway 192.168.201.1
        snat 1

root@prmx:/etc/pve/sdn# cat vnets.cfg
vnet: Internal
        zone Internal


Code:
root@prmx:/etc/pve# ps guax | grep dnsmasq

dnsmasq     1508  0.0  0.0  14184  1948 ?        S    Feb24   0:00 /usr/sbin/dnsmasq -x /run/dnsmasq/dnsmasq.pid -u dnsmasq -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new --local-service --trust-anchor=.,20326,8,2,e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d

root      692923  0.0  0.0   6332  1920 pts/0    S+   10:16   0:00 grep dnsmasq

root@prmx:/etc/pve# systemctl status dnsmasq

● dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server

     Loaded: loaded (/lib/systemd/system/dnsmasq.service; enabled; preset: enabled)

     Active: active (running) since Mon 2025-02-24 07:31:02 +08; 1 day 2h ago

   Main PID: 1508 (dnsmasq)

      Tasks: 1 (limit: 62455)

     Memory: 1.6M

        CPU: 46ms

     CGroup: /system.slice/dnsmasq.service

             └─1508 /usr/sbin/dnsmasq -x /run/dnsmasq/dnsmasq.pid -u dnsmasq -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new --local-service --trust-anchor=>



Feb 24 07:31:02 my1prmx systemd[1]: Starting dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server...

Feb 24 07:31:02 my1prmx dnsmasq[1508]: started, version 2.90 cachesize 150

Feb 24 07:31:02 my1prmx dnsmasq[1508]: DNS service limited to local subnets

Feb 24 07:31:02 my1prmx dnsmasq[1508]: compile time options: IPv6 GNU-getopt DBus no-UBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP conntrack ipset nftset auth crypt>

Feb 24 07:31:02 my1prmx dnsmasq[1508]: reading /etc/resolv.conf

Feb 24 07:31:02 my1prmx dnsmasq[1508]: using nameserver 10.31.22.102#53

Feb 24 07:31:02 my1prmx dnsmasq[1508]: using nameserver 10.31.16.169#53

Feb 24 07:31:02 my1prmx dnsmasq[1508]: read /etc/hosts - 11 names

Feb 24 07:31:02 my1prmx systemd[1]: Started dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server.



These are the details that you asked. I can see its mentioned that the lxc failed, but I'm not really sure how can I troubleshoot it.

Additionally, I can turn on the container if I restore the backup to another node. But I need to configure the network to make it work. I believe there's some setting that needs to be checked from the affected node but I'm not sure where it is.

Sorry, I'm new with linux and proxmox so I'm not really good with these things.
 
Last edited:
There should be another dnsmasq running for the zone defined in the SDN, status can be checked by systemctl status dnsmasq@<zone>
In your case <zone> is most likely Internal.
 
There should be another dnsmasq running for the zone defined in the SDN, status can be checked by systemctl status dnsmasq@<zone>
In your case <zone> is most likely Internal.
This is the output. seems like its not running. Do I need to enable it?


Code:
root@my1prmx:~# systemctl status dnsmasq@internal
○ dnsmasq@internal.service - dnsmasq (internal) - A lightweight DHCP and caching DNS server
     Loaded: loaded (/lib/systemd/system/dnsmasq@.service; disabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/dnsmasq@.service.d
             └─00-dnsmasq-after-networking.conf
     Active: inactive (dead)
 
I have enable the dnsmasq@internal and the service is running. but still got the same error when I turn on the container.

But now when I tried to delete the network configuration form the container, the container can turn on. I'm not sure now on which part of the network configuration make the container problem. Any idea?
 
Please disable the default dnsmasq service on the node (better on all nodes): systemctl --now disable dnsmasq as it should not be running and restart the dnsmasq@Internal (on each node). Watch out for using capital I in the commands as your zone is Internal not internal.
 
I have disabled the dnsmasq

Code:
root@my1prmx:~# systemctl --now disable dnsmasq
Synchronizing state of dnsmasq.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable dnsmasq
root@my1prmx:~# systemctl disable dnsmasq
Synchronizing state of dnsmasq.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable dnsmasq
root@my1prmx:~# systemctl status dnsmasq
○ dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server
     Loaded: loaded (/lib/systemd/system/dnsmasq.service; disabled; preset: enabled)
     Active: inactive (dead)

Feb 26 14:10:26 my1prmx dnsmasq[710356]: compile time options: IPv6 GNU-getopt DBus no-UBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP conntrack ipset nftset aut>
Feb 26 14:10:26 my1prmx dnsmasq[710356]: reading /etc/resolv.conf
Feb 26 14:10:26 my1prmx dnsmasq[710356]: using nameserver 10.31.22.102#53
Feb 26 14:10:26 my1prmx dnsmasq[710356]: using nameserver 10.31.16.169#53
Feb 26 14:10:26 my1prmx dnsmasq[710356]: read /etc/hosts - 11 names
Feb 26 14:10:26 my1prmx systemd[1]: Started dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server.
Feb 26 14:11:17 my1prmx systemd[1]: Stopping dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server...
Feb 26 14:11:17 my1prmx dnsmasq[710356]: exiting on receipt of SIGTERM
Feb 26 14:11:17 my1prmx systemd[1]: dnsmasq.service: Deactivated successfully.
Feb 26 14:11:17 my1prmx systemd[1]: Stopped dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server.


This is the status of dnsmasq@Internal
Code:
root@my1prmx:~# systemctl status dnsmasq@Internal
× dnsmasq@Internal.service - dnsmasq (Internal) - A lightweight DHCP and caching DNS server
     Loaded: loaded (/lib/systemd/system/dnsmasq@.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/dnsmasq@.service.d
             └─00-dnsmasq-after-networking.conf
     Active: failed (Result: exit-code) since Tue 2025-02-25 14:05:09 +08; 24h ago
        CPU: 23ms

Feb 25 14:05:09 my1prmx systemd[1]: dnsmasq@Internal.service: Control process exited, code=exited, status=2/INVALIDARGUMENT
Feb 25 14:05:09 my1prmx systemd[1]: dnsmasq@Internal.service: Failed with result 'exit-code'.
Feb 25 14:05:09 my1prmx systemd[1]: Failed to start dnsmasq@Internal.service - dnsmasq (Internal) - A lightweight DHCP and caching DNS server.
Feb 25 14:05:53 my1prmx systemd[1]: dnsmasq@Internal.service: Unit cannot be reloaded because it is inactive.
Feb 25 14:05:58 my1prmx systemd[1]: dnsmasq@Internal.service: Unit cannot be reloaded because it is inactive.
Feb 26 11:07:26 my1prmx systemd[1]: dnsmasq@Internal.service: Unit cannot be reloaded because it is inactive.
Feb 26 11:07:48 my1prmx systemd[1]: dnsmasq@Internal.service: Unit cannot be reloaded because it is inactive.
Feb 26 11:10:40 my1prmx systemd[1]: dnsmasq@Internal.service: Unit cannot be reloaded because it is inactive.
Feb 26 11:11:05 my1prmx systemd[1]: dnsmasq@Internal.service: Unit cannot be reloaded because it is inactive.
Feb 26 11:11:16 my1prmx systemd[1]: dnsmasq@Internal.service: Unit cannot be reloaded because it is inactive.
 
Please disable the default dnsmasq service on the node (better on all nodes): systemctl --now disable dnsmasq as it should not be running and restart the dnsmasq@Internal (on each node). Watch out for using capital I in the commands as your zone is Internal not internal.
Sorry, after I run it again and restart the dnsmasq@Internal now the service is running. I forgot to restart it just now

Now the container is able to turn on with the current network configuration. Thanks a lot mate!
 
Last edited: