Long delay from command issue to command execution

alexskysilk · Aug 21, 2018

I have a (what appears to be) intermittent problem with container shutdowns taking a LONG time. For example:

As you can see, there is a NEARLY 7 MINUTE delay from the stop request end time to the shutdown command. What is the cause of this delay and how can it be mitigated?

dietmar · Aug 22, 2018

This depends on the VM. Take a look at the syslog inside the VM.

alexskysilk · Aug 30, 2018

There is no indication in the vm logs that there was anything amiss. moreover, this happens on start tasks as well:

The system is not overloaded and is not very busy, and seems to be operating normally. there is no obvious indications in dmesg, and pvestatd and pveproxy dont show any problems.

Code:

# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-2
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-9
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

alexskysilk · Sep 7, 2018

@dietmar any other suggestions?

alexskysilk · Sep 11, 2018

More information:

In the interim period from when the HA command shows as complete (status OK) If I try to run it manually, eg

lxc-start -n 1101832

I get the following response:

No container config specified

I cant do anything to the container in that period (which can stretch into 12-15 minutes) including migrate it off the node. I tried doing that because it looks like it doesnt happen on all nodes at the same time. Can this be lxcfs related?

Code:

# service lxcfs status
● lxcfs.service - FUSE filesystem for LXC
   Loaded: loaded (/lib/systemd/system/lxcfs.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2018-06-28 10:23:10 PDT; 2 months 14 days ago
 Main PID: 5431 (lxcfs)
    Tasks: 11 (limit: 4915)
   Memory: 28.9M
      CPU: 11h 44min 22.188s
   CGroup: /system.slice/lxcfs.service
           └─5431 /usr/bin/lxcfs /var/lib/lxcfs/

there doesnt seem to be any indication of any fault there.

Code:

# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabl
   Active: active (running) since Thu 2018-06-28 10:23:12 PDT; 2 months 14 days ago
 Main PID: 8266 (pmxcfs)
    Tasks: 13 (limit: 4915)
   Memory: 106.4M
      CPU: 2d 16h 15min 46.869s
   CGroup: /system.slice/pve-cluster.service
           └─8266 /usr/bin/pmxcfs
Sep 11 13:31:51 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:54 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:31:54 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:17 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:43 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:52 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:32:53 sky10 pmxcfs[8266]: [status] notice: received log
Sep 11 13:34:33 sky10 pmxcfs[8266]: [status] notice: received log

Nothing here either. This seems to have started relatively recently although this specific node has an uptime of 75 days, and I see it on other clusters as well (some updated more recently.) Needless to say this is causing me grief. Any help would be appreciated.

Search

Search

Long delay from command issue to command execution

alexskysilk

Distinguished Member

dietmar

Proxmox Staff Member

alexskysilk

Distinguished Member

alexskysilk

Distinguished Member

alexskysilk

Distinguished Member