My recommendation is to use physical links to corosync, not vlan ones. As for the other things this is okay, all links should be active-passive in proxmox, so that you don't care if something dies ,and this is it. CEPH will work great in that...
It is okay offering, i see there is financial reason for it. Unfortunately only nested proxmox is an option. But i cannot give you performance loss.
Maybe install os on raid somewhere and add 2xnvme as passthrough for vmdata.
I've worked with a few companies who migrated huge load of RDSses so my recommendation is start with 3-node CEPH, 10g networking and work from that. Go up to let's say 10-15 nodes, then create new cluster. No problem with that.
Usually with NMS or monitoring systems in big support companies, you have one machine outiside of everything(different power,switch and 3g modem usually) so that when anything or everything dies you get notifications etc. If you are maintaining...
here is how i do it in ex2200:
ge-0/0/21 {
description SP1-data;
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan {
members [ Server-Vlan...
I had similar problem with megaraid_sas, the zfs 1 boot disks couldnt write to it if the machine load was high. Once i've shutdown the Vms on it, the kernel upgrade or proxmox-boot-tool would run okay. this was supermicro.