Fast CT shutdown, very slow (minutes) CT stop

Hi,
In another thread I’ve asked for some help about why Proxmox does not fully shutdown and test stuck for minutes with a black screen and blinking cursor. I’ve noticed this strange behavior:

Code:
root@pve:~# time pct shutdown 101                                                                                                                                                                         
                                                                                                                                                                                                          
real    0m2.862s                                                                                                                                                                                          
user    0m0.461s                                                                                                                                                                                          
sys     0m0.034s                                                                                                                                                                                          
root@pve:~# pct start 101                                                                                                                                                                                 
root@pve:~# pct status 101                                                                                                                                                                                
status: running                                                                                                                                                                                           
root@pve:~# time pct stop 101                                                                                                                                                                             
                                                                                                                                                                                                          
real    6m6.311s                                                                                                                                                                                          
user    0m0.500s                                                                                                                                                                                          
sys     0m0.075s

I expected Proxmox VE shutdown procedure to execute shutdown on every VM and CT but maybe instead it executes a less clean stop command and this could be the reason why it takes ages.. stopping CT101 is very slow. Is there a way to understand where it gets stuck? In which script can I look at what action PVE executes on shutdown?

Follows the config of the CT101
Code:
root@pve:~# cat /etc/pve/lxc/101.conf                                                                                                                                                                     
#**Proxmox Backup Server**                                                                                                                                                                                
arch: amd64                                                                                                                                                                                               
cores: 6                                                                                                                                                                                                  
features: nesting=1                                                                                                                                                                                       
hostname: ProxmoxBackupServer                                                                                                                                                                             
memory: 4096                                                                                                                                                                                              
mp1: /mnt/pve/sata_disk/pbs_backups,mp=/pbs_backups                                                                                                                                                       
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.0.0.254,hwaddr=CA:CE:16:FC:86:01,ip=10.0.0.5/24,type=veth                                                                                                   
onboot: 1                                                                                                                                                                                                 
ostype: debian                                                                                                                                                                                            
rootfs: local-lvm:vm-101-disk-0,size=32G                                                                                                                                                                  
searchdomain: mynetwork                                                                                                                                                                                
startup: order=1                                                                                                                                                                                          
swap: 512
 
Hi,
In another thread I’ve asked for some help about why Proxmox does not fully shutdown and test stuck for minutes with a black screen and blinking cursor. I’ve noticed this strange behavior:

Code:
root@pve:~# time pct shutdown 101                                                                                                                                                                        
                                                                                                                                                                                                         
real    0m2.862s                                                                                                                                                                                         
user    0m0.461s                                                                                                                                                                                         
sys     0m0.034s                                                                                                                                                                                         
root@pve:~# pct start 101                                                                                                                                                                                
root@pve:~# pct status 101                                                                                                                                                                               
status: running                                                                                                                                                                                          
root@pve:~# time pct stop 101                                                                                                                                                                            
                                                                                                                                                                                                         
real    6m6.311s                                                                                                                                                                                         
user    0m0.500s                                                                                                                                                                                         
sys     0m0.075s

I expected Proxmox VE shutdown procedure to execute shutdown on every VM and CT but maybe instead it executes a less clean stop command and this could be the reason why it takes ages.. stopping CT101 is very slow. Is there a way to understand where it gets stuck? In which script can I look at what action PVE executes on shutdown?

Follows the config of the CT101
Code:
root@pve:~# cat /etc/pve/lxc/101.conf                                                                                                                                                                    
#**Proxmox Backup Server**                                                                                                                                                                               
arch: amd64                                                                                                                                                                                              
cores: 6                                                                                                                                                                                                 
features: nesting=1                                                                                                                                                                                      
hostname: ProxmoxBackupServer                                                                                                                                                                            
memory: 4096                                                                                                                                                                                             
mp1: /mnt/pve/sata_disk/pbs_backups,mp=/pbs_backups                                                                                                                                                      
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.0.0.254,hwaddr=CA:CE:16:FC:86:01,ip=10.0.0.5/24,type=veth                                                                                                  
onboot: 1                                                                                                                                                                                                
ostype: debian                                                                                                                                                                                           
rootfs: local-lvm:vm-101-disk-0,size=32G                                                                                                                                                                 
searchdomain: mynetwork                                                                                                                                                                               
startup: order=1                                                                                                                                                                                         
swap: 512
Regardless you stop the container via `pct shutdown`, shutdown in WEB-GUI or shutdown caused by host shutdown the same process for/in the container takes place. /var/log/syslog from container shows what happens, typical output:


Code:
Dec  1 09:25:13 debct systemd[1]: Received SIGRTMIN+3.
Dec  1 09:25:13 debct systemd[1]: Removed slice system-modprobe.slice.
Dec  1 09:25:13 debct systemd[1]: Stopped target Graphical Interface.
Dec  1 09:25:13 debct systemd[1]: Stopped target Multi-User System.
Dec  1 09:25:13 debct systemd[1]: Stopped target Login Prompts.
Dec  1 09:25:13 debct systemd[1]: Stopped target Remote Encrypted Volumes.
Dec  1 09:25:13 debct systemd[1]: Stopped target Timers.
Dec  1 09:25:13 debct systemd[1]: apt-daily-upgrade.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily apt upgrade and clean activities.
Dec  1 09:25:13 debct systemd[1]: apt-daily.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily apt download activities.
Dec  1 09:25:13 debct systemd[1]: e2scrub_all.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Periodic ext4 Online Metadata Check for All Filesystems.
Dec  1 09:25:13 debct systemd[1]: logrotate.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily rotation of log files.
Dec  1 09:25:13 debct systemd[1]: man-db.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily man-db regeneration.
Dec  1 09:25:13 debct systemd[1]: systemd-tmpfiles-clean.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Dec  1 09:25:13 debct systemd[1]: Stopped target System Time Synchronized.
Dec  1 09:25:13 debct systemd[1]: Stopped target System Time Set.
Dec  1 09:25:13 debct systemd[1]: Reached target Unmount All Filesystems.
Dec  1 09:25:13 debct systemd[1]: Stopping Console Getty...
Dec  1 09:25:13 debct systemd[1]: Stopping Container Getty on /dev/tty1...
Dec  1 09:25:13 debct systemd[1]: Stopping Container Getty on /dev/tty2...
Dec  1 09:25:13 debct systemd[1]: Stopping Regular background program processing daemon...
Dec  1 09:25:13 debct systemd[1]: Stopping D-Bus System Message Bus...
Dec  1 09:25:13 debct systemd[1]: postfix.service: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Postfix Mail Transport Agent.
Dec  1 09:25:13 debct systemd[1]: Stopping Postfix Mail Transport Agent (instance -)...

As it can be seen just OS performs shutdown within 1 second. If it lasts longer there is something special configured in the container.
 
  • Like
Reactions: MightySlaytanic
Regardless you stop the container via `pct shutdown`, shutdown in WEB-GUI or shutdown caused by host shutdown the same process for/in the container takes place. /var/log/syslog from container shows what happens, typical output:


Code:
Dec  1 09:25:13 debct systemd[1]: Received SIGRTMIN+3.
Dec  1 09:25:13 debct systemd[1]: Removed slice system-modprobe.slice.
Dec  1 09:25:13 debct systemd[1]: Stopped target Graphical Interface.
Dec  1 09:25:13 debct systemd[1]: Stopped target Multi-User System.
Dec  1 09:25:13 debct systemd[1]: Stopped target Login Prompts.
Dec  1 09:25:13 debct systemd[1]: Stopped target Remote Encrypted Volumes.
Dec  1 09:25:13 debct systemd[1]: Stopped target Timers.
Dec  1 09:25:13 debct systemd[1]: apt-daily-upgrade.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily apt upgrade and clean activities.
Dec  1 09:25:13 debct systemd[1]: apt-daily.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily apt download activities.
Dec  1 09:25:13 debct systemd[1]: e2scrub_all.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Periodic ext4 Online Metadata Check for All Filesystems.
Dec  1 09:25:13 debct systemd[1]: logrotate.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily rotation of log files.
Dec  1 09:25:13 debct systemd[1]: man-db.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily man-db regeneration.
Dec  1 09:25:13 debct systemd[1]: systemd-tmpfiles-clean.timer: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Dec  1 09:25:13 debct systemd[1]: Stopped target System Time Synchronized.
Dec  1 09:25:13 debct systemd[1]: Stopped target System Time Set.
Dec  1 09:25:13 debct systemd[1]: Reached target Unmount All Filesystems.
Dec  1 09:25:13 debct systemd[1]: Stopping Console Getty...
Dec  1 09:25:13 debct systemd[1]: Stopping Container Getty on /dev/tty1...
Dec  1 09:25:13 debct systemd[1]: Stopping Container Getty on /dev/tty2...
Dec  1 09:25:13 debct systemd[1]: Stopping Regular background program processing daemon...
Dec  1 09:25:13 debct systemd[1]: Stopping D-Bus System Message Bus...
Dec  1 09:25:13 debct systemd[1]: postfix.service: Succeeded.
Dec  1 09:25:13 debct systemd[1]: Stopped Postfix Mail Transport Agent.
Dec  1 09:25:13 debct systemd[1]: Stopping Postfix Mail Transport Agent (instance -)...

As it can be seen just OS performs shutdown within 1 second. If it lasts longer there is something special configured in the container.
Hi, the shutdown process is really fast, while if I launch a stop of the CT 101 it gets stuck for 3-6 minutes. I've noticed that the lxc-start process stays in D state after launching the stop and it's waiting for a file descriptor to be closed:

Code:
root@pve:~# ps aux | grep "lxc.*101" | grep -v grep
root     3192525  0.0  0.0   3952  3208 ?        Ds   11:47   0:00 /usr/bin/lxc-start -F -n 101
root     3193084  0.0  0.0  91256  7988 ?        S    11:47   0:00 /usr/bin/termproxy 5900 --path /vms/101 --perm VM.Console -- /usr/bin/dtach -A /var/run/dtach/vzctlconsole101 -r winch -z lxc-console -n 101 -e -1
root     3193088  0.0  0.0   2280   588 pts/1    Ss+  11:47   0:00 /usr/bin/dtach -A /var/run/dtach/vzctlconsole101 -r winch -z lxc-console -n 101 -e -1
root     3193089  0.0  0.0   2412    84 ?        Ss   11:47   0:00 /usr/bin/dtach -A /var/run/dtach/vzctlconsole101 -r winch -z lxc-console -n 101 -e -1
root     3193090  0.0  0.0   3884  2640 pts/3    Ss+  11:47   0:00 lxc-console -n 101 -e -1
root     3193102  0.0  0.0   3884  2804 pts/0    S+   11:47   0:00 lxc-stop -n 101 --kill

I tried to launch by hand the lxc-start command with strace and this is where it gets stuck as soon as I stop CT 101:

Code:
strace /usr/bin/lxc-start -F -n 101 2>&1 | grep "open\|close" | tee strace.log
[...]
openat(AT_FDCWD, "/proc/3197648/ns/net", O_RDONLY|O_CLOEXEC) = 5
close(5)                                = 0
openat(AT_FDCWD, "/run/lxc//var/lib/lxc/monitor-fifo", O_WRONLY|O_NONBLOCK) = 5
close(5)                                = 0
close(5)                                = 0
close(5)                                = 0
close(5)                                = 0
openat(AT_FDCWD, "/run/lxc//var/lib/lxc/monitor-fifo", O_WRONLY|O_NONBLOCK) = 5
close(5)                                = 0
close(5)                                = 0
close(5)

If I block network traffic from CT 101 (with an outgoing ACL) it stops in few seconds.

So, to summarize:
1) CT101 shutdown is very fast
2) CT101 stop takes minutes with lxc-start process stuck on closing a file descriptor
3) CT101 stop is very fast if I launch the container moving its interface in an isolated bridge or blocking outgoing traffic (it is a PBS container, so there could be some communication with PVE that blocks the stop process)

To do a bit fo history, I've started looking at how CTs and VMs where doing the shutdown/stop procedure since I've noticed that when I shutdown my PVE it almost immediately shows a blank screen with the cursor on top of it but then it doesn't poweroff the machine (see this thread). BTW, since the shutdown procedure of PVE should execute a shutdown on VMs and CTs and not a stop, the slow stop of CT101 shouldn't be the cause of the problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!