Long delay from command issue to command execution

alexskysilk

Distinguished Member
Oct 16, 2015
1,821
363
153
Chatsworth, CA
www.skysilk.com
I have a (what appears to be) intermittent problem with container shutdowns taking a LONG time. For example:
upload_2018-8-21_9-16-45.png

As you can see, there is a NEARLY 7 MINUTE delay from the stop request end time to the shutdown command. What is the cause of this delay and how can it be mitigated?
 
There is no indication in the vm logs that there was anything amiss. moreover, this happens on start tasks as well:

upload_2018-8-30_8-34-18.png

The system is not overloaded and is not very busy, and seems to be operating normally. there is no obvious indications in dmesg, and pvestatd and pveproxy dont show any problems.

Code:
# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-2
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-9
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
 
More information:

In the interim period from when the HA command shows as complete (status OK) If I try to run it manually, eg

lxc-start -n 1101832

I get the following response:

No container config specified
I cant do anything to the container in that period (which can stretch into 12-15 minutes) including migrate it off the node. I tried doing that because it looks like it doesnt happen on all nodes at the same time. Can this be lxcfs related?

Code:
# service lxcfs status
● lxcfs.service - FUSE filesystem for LXC
   Loaded: loaded (/lib/systemd/system/lxcfs.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2018-06-28 10:23:10 PDT; 2 months 14 days ago
 Main PID: 5431 (lxcfs)
    Tasks: 11 (limit: 4915)
   Memory: 28.9M
      CPU: 11h 44min 22.188s
   CGroup: /system.slice/lxcfs.service
           └─5431 /usr/bin/lxcfs /var/lib/lxcfs/

there doesnt seem to be any indication of any fault there.

Code:
# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabl
   Active: active (running) since Thu 2018-06-28 10:23:12 PDT; 2 months 14 days ago
 Main PID: 8266 (pmxcfs)
    Tasks: 13 (limit: 4915)
   Memory: 106.4M
      CPU: 2d 16h 15min 46.869s
   CGroup: /system.slice/pve-cluster.service
           └─8266 /usr/bin/pmxcfs
Sep 11 13:31:51 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:54 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:54 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:17 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:43 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:53 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:34:33 sky10 pmxcfs[8266]: [status] notice: received log

Nothing here either. This seems to have started relatively recently although this specific node has an uptime of 75 days, and I see it on other clusters as well (some updated more recently.) Needless to say this is causing me grief. Any help would be appreciated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!