Starting CT fails failed to connect to monitor socket: Connection refused

lxc · Apr 28, 2023

Hello

I have an issue where after around 1 hours since server reboot I can no longer start CT.

Creating a CT works fine and no errors are printed

Starting it however gives the following output

Code:

failed to connect to monitor socket: Connection refused

systemctl status pve-container@2430.service

Code:

Apr 28 13:05:56 node1 systemd[1]: Started PVE LXC Container: 2430.
Apr 28 13:05:59 node1 systemd[1]: pve-container@2430.service: Main process exited, code=exited, status=1/FAILURE
Apr 28 13:05:59 node1 systemd[1]: pve-container@2430.service: Failed with result 'exit-code'.

Code:

root@node1:~# lxc-start 2445 --logfile /test.log
lxc-start: 2445: ../src/lxc/lxccontainer.c: wait_on_daemonized_start: 878 Received container state "ABORTING" instead of "RUNNING"
lxc-start: 2445: ../src/lxc/tools/lxc_start.c: main: 306 The container failed to start
lxc-start: 2445: ../src/lxc/tools/lxc_start.c: main: 309 To get more details, run the container in foreground mode
lxc-start: 2445: ../src/lxc/tools/lxc_start.c: main: 311 Additional information can be obtained by setting the --logfile and --logpriority options
root@node1:~# cat /test.log
lxc-start 2445 20230428140019.584 ERROR    conf - ../src/lxc/conf.c:run_buffer:322 - Script exited with status 2
lxc-start 2445 20230428140019.627 ERROR    network - ../src/lxc/network.c:lxc_create_network_priv:3427 - No such device - Failed to create network device
lxc-start 2445 20230428140019.627 ERROR    start - ../src/lxc/start.c:lxc_spawn:1840 - Failed to create the network
lxc-start 2445 20230428140019.627 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc-start 2445 20230428140019.627 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:main:306 - The container failed to start
lxc-start 2445 20230428140019.627 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:main:309 - To get more details, run the container in foreground mode
lxc-start 2445 20230428140019.627 ERROR    lxc_start - ../src/lxc/tools/lxc_start.c:main:311 - Additional information can be obtained by setting the --logfile and --logpriority options
lxc-start 2445 20230428140019.628 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "2445"

If I stop an already running CT and try to start it again, it will also fail

Code:

root@node1:/etc/pve/lxc# pct stop 1399
root@node1:/etc/pve/lxc# pct start 1399
failed to connect to monitor socket: Connection refused

I also noticed this issue only started happening after I passed 1k CT deployed. I wonder if there's some limit or memory increase that needs to be set?

My system itself is fine. 10 load average, 500GB free memory. There's nothing in dmesg

It also gets stuck when I try to SSH into an already existing CT. Like nothing past this loads

Code:

PTY allocation request failed on channel 0
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.15.102-1-pve x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

accessing it via the proxmox console works fine and everything is very responsive in there

lxc · Apr 28, 2023

lxc said:
I also noticed this issue only started happening after I passed 1k CT deployed. I wonder if there's some limit or memory increase that needs to be set?

When I removed a few 100 CT that were already running, the issue went away and I could once again start them without any issue. But as soon as it reached the 900-1100 CT deployed the issue once again happened and I could no longer start any CT because it would only print

Code:

failed to connect to monitor socket: Connection refused

Search

Search

Starting CT fails failed to connect to monitor socket: Connection refused

lxc

New Member

lxc

New Member