Long delay from command issue to command execution

alexskysilk

Active Member
Oct 16, 2015
596
62
28
Chatsworth, CA
www.skysilk.com
I have a (what appears to be) intermittent problem with container shutdowns taking a LONG time. For example:
upload_2018-8-21_9-16-45.png

As you can see, there is a NEARLY 7 MINUTE delay from the stop request end time to the shutdown command. What is the cause of this delay and how can it be mitigated?
 

alexskysilk

Active Member
Oct 16, 2015
596
62
28
Chatsworth, CA
www.skysilk.com
There is no indication in the vm logs that there was anything amiss. moreover, this happens on start tasks as well:

upload_2018-8-30_8-34-18.png

The system is not overloaded and is not very busy, and seems to be operating normally. there is no obvious indications in dmesg, and pvestatd and pveproxy dont show any problems.

Code:
# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-2
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-9
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
 

alexskysilk

Active Member
Oct 16, 2015
596
62
28
Chatsworth, CA
www.skysilk.com
More information:

In the interim period from when the HA command shows as complete (status OK) If I try to run it manually, eg

lxc-start -n 1101832

I get the following response:

No container config specified
I cant do anything to the container in that period (which can stretch into 12-15 minutes) including migrate it off the node. I tried doing that because it looks like it doesnt happen on all nodes at the same time. Can this be lxcfs related?

Code:
# service lxcfs status
● lxcfs.service - FUSE filesystem for LXC
   Loaded: loaded (/lib/systemd/system/lxcfs.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2018-06-28 10:23:10 PDT; 2 months 14 days ago
 Main PID: 5431 (lxcfs)
    Tasks: 11 (limit: 4915)
   Memory: 28.9M
      CPU: 11h 44min 22.188s
   CGroup: /system.slice/lxcfs.service
           └─5431 /usr/bin/lxcfs /var/lib/lxcfs/
there doesnt seem to be any indication of any fault there.

Code:
# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabl
   Active: active (running) since Thu 2018-06-28 10:23:12 PDT; 2 months 14 days ago
 Main PID: 8266 (pmxcfs)
    Tasks: 13 (limit: 4915)
   Memory: 106.4M
      CPU: 2d 16h 15min 46.869s
   CGroup: /system.slice/pve-cluster.service
           └─8266 /usr/bin/pmxcfs
Sep 11 13:31:51 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:54 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:54 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:17 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:43 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:53 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:34:33 sky10 pmxcfs[8266]: [status] notice: received log
Nothing here either. This seems to have started relatively recently although this specific node has an uptime of 75 days, and I see it on other clusters as well (some updated more recently.) Needless to say this is causing me grief. Any help would be appreciated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!