I am struggling to fully understand everything about NUMA, however I feel like I've got a decent understanding of how proxmox is handling it, as I've run many configuration tests over the last few days trying to solve my performance issues.
I have a AMD EPYC 7551p (zen 1 architecture, 7001 series) 32 core 64 thread CPU and 256gb of ram ( 8 dimms, 2 channels per NUMA node at 64gb each)
I notice that when in proxmox if I enable NUMA feature on my two VM's, both VMs will latch onto the same numa node usually, even though I've configured the VM's to use all the numa node cores (sockets as defined in proxmox). I don't want both VM's to latch onto the same numa node, sharing the resources. I've also manually set affinity on both VMs, but there is no guarantee when they startup, that they will latch onto the desired numa node that has the cores that are defined in the affinity configuration in proxmox.
I'd like to be able to assign each VM to a specific numa node and predefine the affinity to be for those numa nodes cores, thus ensuring that the other vm will not attempt to ever engage those cores. Also ensuring that if the VM starts up, it gets assigned the right numa node, otherwise it could be assigned a different numa node than what the affinity defines, and I presume that is causing performance problems as well.
Am I missing something on how NUMA works with proxmox?
Currently I feel like I need to disable numa feature in proxmox, set affinity in proxmox, then make a static hook script for each VM which utilizes something like numactl to enforce numa node configuration and statically assign the numa nodes I desire for those VMs.
I also have a hard time understanding how proxmox handles SMT in relation to cores (not vcpus, but cores), as with SMT, one core has two threads, when I assign a numa node which has 8 cores, it has a total of 16 threads, yet in my VM I only see the 8 cores. Thus I've been assigning 16 cores, assuming that proxmox simply things each thread is a core and thus for me to max out a numa node, I must define 16 cores in the configuration - but I've noticed that I had some performance problems when doing so, thus I'd like clarification on how for this CPU I should best define the configuration and should I do a custom hookscript to statically assign the numa nodes?
thoughts?
I have a AMD EPYC 7551p (zen 1 architecture, 7001 series) 32 core 64 thread CPU and 256gb of ram ( 8 dimms, 2 channels per NUMA node at 64gb each)
I notice that when in proxmox if I enable NUMA feature on my two VM's, both VMs will latch onto the same numa node usually, even though I've configured the VM's to use all the numa node cores (sockets as defined in proxmox). I don't want both VM's to latch onto the same numa node, sharing the resources. I've also manually set affinity on both VMs, but there is no guarantee when they startup, that they will latch onto the desired numa node that has the cores that are defined in the affinity configuration in proxmox.
I'd like to be able to assign each VM to a specific numa node and predefine the affinity to be for those numa nodes cores, thus ensuring that the other vm will not attempt to ever engage those cores. Also ensuring that if the VM starts up, it gets assigned the right numa node, otherwise it could be assigned a different numa node than what the affinity defines, and I presume that is causing performance problems as well.
Am I missing something on how NUMA works with proxmox?
Currently I feel like I need to disable numa feature in proxmox, set affinity in proxmox, then make a static hook script for each VM which utilizes something like numactl to enforce numa node configuration and statically assign the numa nodes I desire for those VMs.
I also have a hard time understanding how proxmox handles SMT in relation to cores (not vcpus, but cores), as with SMT, one core has two threads, when I assign a numa node which has 8 cores, it has a total of 16 threads, yet in my VM I only see the 8 cores. Thus I've been assigning 16 cores, assuming that proxmox simply things each thread is a core and thus for me to max out a numa node, I must define 16 cores in the configuration - but I've noticed that I had some performance problems when doing so, thus I'd like clarification on how for this CPU I should best define the configuration and should I do a custom hookscript to statically assign the numa nodes?
thoughts?