It's quite possible that your are cpu limited, as currently vm can only use 1 core for 1 virtual disk.
Multiple threading (with multiple iothread by disk), should be available soon, I have already send patch to the proxmox dev mailing list.
more than one OSE per nvme will not help. non PLP are really like 500iops 4k write vs 20000iops for plp drive.
At minimum, use cache=writeback, it should help to avoid small write when possible (merging small adjacent writes to big write)
After months of hard work and collaboration with our community, we are thrilled to release the beta version of Proxmox Datacenter Manager. This version is based on the great Debian 13 "Trixie" and comes with a 6.14.11 Kernel as stable default and...
at minimum, if you use your arista as exit-node, (advertising the default route from arista)
you should remove
"exitnodes proxmox1-4,proxmox1-3,proxmox1-1"
and peer with your arista router
"peers proxmox1ip, proxmox2ip, proxmox3ip, aristaip"...
that's not true, CRM also open/close/update watchdog
https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/PVE/HA/CRM.pm;h=9b80b73f694062f4a82eed59ea601314daa5ad59;hb=HEAD
both crm && lrm are connected through the watchdog-muxer
It's use for any qcow2 storage, including file storage. (local,nfs,...). External snapshot allow snap/delete snap without interruption. Current internal snapshot freeze the vm when deleting snapshot for example.
(I forgot to say that for snapshot, we use qcow2 sub-allocated cluster (l2_extended=on) with 128k cluster size, so metadatas are 32x smaller than base image (64k cluster without suballocated cluster).
around 4MB memory for 1TB image...
the cache value is a max value, it's only allocated when you need to load specific metadatas (and unused metadatas are flushed after 10min). It's not too much different than zfs btw, where you need memory to handle metadatas too.
so , yes, if...
I don't known who as recommend you tu use consumer ssd with zfs or ceph, but performance will be horrible. (because lack of power-loss protection, so the fsync for the zfs/ceph journal can't be cached). pm1643 (or any other enterprise ssd with...
Why do you want to use ZFS RAIDZ? With your minimal amount of disks a mirror setup is faster with a RAID10 like setup.
IMHO if you do not need to migrate your vm between the nodes, ZFS is fine. IO delay will be lower as it is local.
If you value...