If you want to max out your nvme with ceph, you need to create multiple osd by nvme disk
https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#NVMe-SSD-partitioning
The current osd performance is around 30-40k 4k iops by osd.
4 osd by nvme is good). Be carefull than you need around 4GB memory by osd.
and if you need to do lot of small iops, you need a lof of cores (try to use biggest frequencies possible for lower latency).
If your workload is more blgger block (video streaming for example, less iops but bigger through), cpu is less critical.
The cpu is main bottleneck depend of number of iops (not the size of the iops), because the crush algorithm need to be used. (Both on client side, in qemu . and osd side, for osd->osd replication).
Personnally, I'm running with 2x25gb or 2x40gb my ceph clusters, with web/database traffic, I'm far to reach the full bandwith , because I'm doing small iops. Only the recovery/rebalancing can really push the network throughtput to the max.
Reaching 2x100GB is really not so easy, without fine tuning, cores pining,etc..(even more difficult with dual-socket && numa).
As you can reach internal memory bus limit with copy from nvme->cpu->nic