qm list (and other commands) hanging.

Hyien

Member
Jun 18, 2021
94
2
13
34
I have a 3node cluster all running
pve-manager/7.4-3/9002ab8a (running kernel: 5.15.102-1-pve)

When I rm 'qm list' or other Proxmox commands, the command simply hangs.

pvecm status
Cluster information
-------------------
Name: XXX
Config Version: 13
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Apr 3 03:24:54 2023
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 1.4978
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 XXX
0x00000002 1 XXX (local)
0x00000003 1 XXX
 
Hi,
please check and post the content of the journal from around the time the command was executed journalctl --since <DATE> --until <DATE>.
 
i see a bunch of
Apr 03 08:35:50 XXX corosync[251358]: [TOTEM ] Retransmit List: 12 13 16 23 24 27 36 3d 3e 3f 40 41 43 4f 50 51 52 53 5b 5d 5e 5f 60
Apr 03 08:35:51 XXX pmxcfs[364280]: [status] notice: cpg_send_message retry 80
Apr 03 08:35:51 XXX corosync[251358]: [TOTEM ] Retransmit List: 12 13 16 23 24 27 36 3d 3e 3f 40 41 43 4f 50 51 52 53 5b 5d 5e 5f 60
Apr 03 08:35:52 XXX corosync[251358]: [TOTEM ] Retransmit List: 12 13 16 23 24 27 36 3d 3e 3f 40 41 43 4f 50 51 52 53 5b 5d 5e 5f 60
Apr 03 08:35:52 XXX pmxcfs[364280]: [status] notice: cpg_send_message retry 90
Apr 03 08:35:53 XXX pmxcfs[364280]: [status] notice: cpg_send_message retry 100
Apr 03 08:35:53 XXX pmxcfs[364280]: [status] notice: cpg_send_message retried 100 times
Apr 03 08:35:53 XXX pmxcfs[364280]: [status] crit: cpg_send_message failed: 6
Apr 03 08:35:54 XXX pmxcfs[364280]: [status] notice: cpg_send_message retry 10
Apr 03 08:35:54 XXX corosync[251358]: [TOTEM ] Token has not been received in 2738 ms
Apr 03 08:35:55 XXX pmxcfs[364280]: [status] notice: cpg_send_message retry 20
 
corosync is running at 100%. what might be causing this?
Please try to restart the pmxcfs service and see if the problem persists, systemctl restart pmxcfs.service.
Is your network operational?
 
Last edited:
Could you please provide your network config cat /etc/network/interfaces and the corosync config of all nodes cat /etc/corosync/corosync.conf. Is corosync running via a dedicated network or is it sharing the same network as the other traffic? Did you perform any changes before the problem arouse?

Input from a colleague what else to try (these have to be performed each step on all nodes before proceeding):
  • disable HA services if HA is enabled to prevent fencing
  • stop corosync & pve-cluster by running systemctl stop corosync pve-cluster
  • start corosync systemctl start corosync
  • check and post corosync logs corosync-quorumtool -s and corosync-cfgtool -n
  • start pmxcfs via systemctl start pve-cluster
  • check logs and pvecm status
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!