40 node prod cluster restarts when joining a new node or removing.

This means that once a node goes offline, forming a new membership will take ~1min24s (one full token and one full consensus timeout), which is too long HA is active and very likely leads to fencing, see [1]. My suggestion would be to remove all current custom corosync configuration and instead set only a custom token coefficient as described in [1].
I haven't tested how corosync with a customized config like yours would react to a config change, so it might be advisable to disarm HA before making the change (and re-enable it afterwards).

[1] https://forum.proxmox.com/threads/proxmox-with-48-nodes.174684/page-2#post-825826

We reverted our settings as per your suggestion and applied the below only. When these settings were applied HA was not disabled (jfyi)

Code:
totem {
  cluster_name: proxmox-prod
  config_version: 80
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  token_coefficient: 200
  version: 2
}

Our findings/testing - deleting a node from the cluster around this time > Jan 15 11:48:22 in the logs attached. The cluster did not lose quorum. Thank you!

Based on this do you still recommend creating a separate network for corosync? The only way possible we see is to create another bond for corosync with LACP (fast) everywhere on the existing switches that are in a (VLT).
 

Attachments

We reverted our settings as per your suggestion and applied the below only. When these settings were applied HA was not disabled (jfyi)

Code:
totem {
  cluster_name: proxmox-prod
  config_version: 80
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  token_coefficient: 200
  version: 2
}

Our findings/testing - deleting a node from the cluster around this time > Jan 15 11:48:22 in the logs attached. The cluster did not lose quorum. Thank you!
Sounds good, thanks for reporting back!
Based on this do you still recommend creating a separate network for corosync? The only way possible we see is to create another bond for corosync with LACP (fast) everywhere on the existing switches that are in a (VLT).
I'd generally recommend a dedicated primary network for corosync, to prevent other traffic from driving up corosync latencies, see [1]. corosync can handle multiple redundant networks itself [2], so creating a bond for corosync is usually not necessary, though it can be an option to configure a redundant corosync network over a pre-existing bond shared with other traffic types. See [3] for some additional considerations that are relevant when running corosync over a bond [3].

[1] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_requirements
[2] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_redundancy
[3] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_corosync_over_bonds