[SOLVED] Slow access to pmxcfs in PVE 5.2 cluster

chrone · Dec 19, 2018

Hi Proxmoxers,

What could causing slow access (read and write) to pmxcfs which is mounted in /etc/pve in PVE 5.2 cluster?

For testing, It takes more than 10 seconds to create an empty file inside /etc/pve. There are no performance issue on the local storage and confirmed by mounting the pmxcfs locally.

Stoiko Ivanov · Dec 19, 2018

Most likely cause in a cluster would be the cluster-network (all writes to /etc/pve need to synchronized across all cluster-nodes).

* Check your journal (especially for messages from pmxcfs and corosync)
* you can use `omping` to get some measurements of your cluster network

See our docs for some omping invocations (and general information on our cluster-stack's network requirements):
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_cluster_network

chrone · Dec 20, 2018

Stoiko Ivanov said:
Most likely cause in a cluster would be the cluster-network (all writes to /etc/pve need to synchronized across all cluster-nodes).

* Check your journal (especially for messages from pmxcfs and corosync)
* you can use `omping` to get some measurements of your cluster network

See our docs for some omping invocations (and general information on our cluster-stack's network requirements):
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_cluster_network

Hi Stoiko, thanks for the advice.

Due to network constraint, we are using custom corosync configuration with UDPU to support for more than 16 nodes. We are also using mixed cluster PVE 5.0 and PE 5.2. From the corosync journal, I could only found one node is flapping and kept rejoining the corosync cluster without even left at all, and from the pmxcfs journal, there are often times it fails to write after retrying several times.

Is there a benchmark tool to test the corosync with UDPU as I believe omping is to test multicast?

Stoiko Ivanov · Dec 20, 2018

updu (unicast) does not scale as well as multicast - and 16 is quite a large cluster - so my first guess is that you've reached the limit (and a bit beyon) of what is possible with your current network infrastructure.

omping sends both unicast and multicast packets - so you can still use it to check the latency.
What kind of network is your corosync running on?

You could consider creating a separate corosync ring with dedicated nics (and use a simple unmanaged switch, those usually don't interfere with multicast)

chrone · Dec 20, 2018

Stoiko Ivanov said:
updu (unicast) does not scale as well as multicast - and 16 is quite a large cluster - so my first guess is that you've reached the limit (and a bit beyon) of what is possible with your current network infrastructure.

omping sends both unicast and multicast packets - so you can still use it to check the latency.
What kind of network is your corosync running on?

You could consider creating a separate corosync ring with dedicated nics (and use a simple unmanaged switch, those usually don't interfere with multicast)

I see. Thanks for the limitation information.

We have similar setup and do not have issue. I'll give omping a try then. The corosync is running on top of ovs bridge + lacp bond.

Unfortunately, adding new NIC is not an option.

chrone · Dec 21, 2018

Stoiko Ivanov said:
updu (unicast) does not scale as well as multicast - and 16 is quite a large cluster - so my first guess is that you've reached the limit (and a bit beyon) of what is possible with your current network infrastructure.

omping sends both unicast and multicast packets - so you can still use it to check the latency.
What kind of network is your corosync running on?

You could consider creating a separate corosync ring with dedicated nics (and use a simple unmanaged switch, those usually don't interfere with multicast)

Hi Stoiko,

The omping test successfully without packet loss on unicast, and the latency is below 0.2ms.

Reducing the cluster down to 6 nodes did not help. Could be the frequent membership reformed causing the slow pmxcfs access?

The corosync and pve-cluster journal are as follows.

Code:

Dec 21 16:05:27 pmx1 corosync[4952]: notice  [TOTEM ] A new membership (10.100.100.101:78124) was formed. Members
Dec 21 16:05:27 pmx1 corosync[4952]:  [TOTEM ] A new membership (10.100.100.101:78124) was formed. Members
Dec 21 16:05:27 pmx1 corosync[4952]: warning [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]: warning [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]: warning [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]: warning [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]: warning [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]: warning [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:27 pmx1 corosync[4952]:  [CPG   ] downlist left_list: 0 received
Dec 21 16:05:36 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 10
Dec 21 16:05:37 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 20
Dec 21 16:05:38 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 30
Dec 21 16:05:39 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 40
Dec 21 16:05:40 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 50
Dec 21 16:05:41 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 60
Dec 21 16:05:42 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 70
Dec 21 16:05:43 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 80
Dec 21 16:05:44 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 90
Dec 21 16:05:45 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 100
Dec 21 16:05:45 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retried 100 times
Dec 21 16:05:45 pmx1 pmxcfs[4925]: [status] crit: cpg_send_message failed: 6
Dec 21 16:05:46 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 10
Dec 21 16:05:47 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 20
Dec 21 16:05:48 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 30
Dec 21 16:05:49 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 40
Dec 21 16:05:50 pmx1 pmxcfs[4925]: [status] notice: cpg_send_message retry 50

chrone · Dec 27, 2018

Using tcpdump to troubleshoot corosync issue, I managed to find some old Proxmox nodes which where removed from the cluster but got turned on again accidentally. Removing the old nodes without reinstalling as documented in Proxmox Cluster Manager documentation fixed the issue.

Stoiko Ivanov · Dec 27, 2018

Glad the issue was found! Thanks for updating the thread and marking it as solved!

Search

Search

[SOLVED] Slow access to pmxcfs in PVE 5.2 cluster

chrone

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

chrone

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

chrone

Renowned Member

chrone

Renowned Member

chrone

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

We value your privacy