"pct list" time out

greg

Renowned Member
Apr 6, 2011
137
2
83
Greetings
On my new Proxmox 6.4.13, something got "stuck": pct list or any service pve-<whatever> stop hang. In the log I see lines like:

Code:
systemd[1]: pvestatd.service: Stopping timed out. Terminating.
scwv10 systemd[1]: pvedaemon.service: State 'stop-sigterm' timed out. Killing
scwv10 systemd[1]: pvedaemon.service: Killing process 40734 (pvedaemon) with signal SIGKILL.
scwv10 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.

etc

Note that kill -9 <pct process> doesn't work.

I don't understand what could block "pct list"... any idea?

Thanks in advance

Regards
 
Last edited:
it looks like you have an I/O on your node, please post the output of the below commands for more information:

Bash:
uptime
free
pveversion -v
 
Hello
I'm still having this problem across several nodes.

'top' and 'iotop' says the machine is basically idle.

Code:
# uptime
 09:24:30 up 400 days,  9:22, 22 users,  load average: 21,08, 20,82, 20,71
 
 # free
              total        used        free      shared  buff/cache   available
Mem:       65755788    43674868    18835076     1731256     3245844    19641952
Swap:      20971512     2162860    18808652

# pveversion -v                                                                                                                                                                                                                                            [16/44]
proxmox-ve: 6.4-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-12
pve-kernel-helper: 6.4-12
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
pve-zsync: 2.2
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.7-pve1
 
Hello,

Do you see an IO delay on your node? Datacenter->NodeName->Summary->IO delay section?

Have you tried to reboot the pvestatd service?
Bash:
pvestatd restart
 
Thanks for your answer.
pvestatd restart hangs, like most pve commands.
I cannot see the GUI for the node, it says "Connection failure. Network error or Proxmox VE services not running?" and is greyed out ("permission denied - invalid PVE ticket (401)") on the GUI of the other nodes (it wasn't yesterday).
On the others nodes, iodelay is 0, except for one another node which fluctuate a lot between 0 and 30%.

As an example, I ran pct delsnapshot yesterday and there's no output yet.
 
Hmmm,

Can you attach the Syslog? `/var/log/syslog`

Are you sure from the entries for /etc/hosts and /etc/hsotname?
 
/etc/hosts and /etc/hostname seem to be fine (they are the same as they always have been.

syslog doesn't show anything but this:

Code:
Feb  7 12:58:02 sysv5 corosync[12754]:   [QUORUM] Sync members[7]: 1 2 3 4 5 6 7
Feb  7 12:58:02 sysv5 corosync[12754]:   [TOTEM ] A new membership (1.8685a) was formed. Members
Feb  7 12:58:02 sysv5 corosync[12754]:   [QUORUM] Members[7]: 1 2 3 4 5 6 7
Feb  7 12:58:02 sysv5 corosync[12754]:   [MAIN  ] Completed service synchronization, ready to provide service.

and then:

Code:
Feb  7 13:03:41 sysv5 corosync[12754]:   [KNET  ] link: host: 5 link: 0 is down
Feb  7 13:03:41 sysv5 corosync[12754]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
Feb  7 13:03:41 sysv5 corosync[12754]:   [KNET  ] host: host: 5 has no active links
Feb  7 13:03:45 sysv5 corosync[12754]:   [KNET  ] rx: host: 5 link: 0 is up
Feb  7 13:03:45 sysv5 corosync[12754]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)

Not sure what is it but it's always been like this.
 
I was able to "unfreeze" all the hang pve command like this:

- disconnected the private network used for cluster communication
- pvecm e 1
- systemctl restart pve-cluster.service

GUI is now available. Now I'll try to have the node back into the cluster.
 
It seems to be the same on other nodes: to be able to do anything, I have to "isolate" the node... :(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!