Same company. That's good enough to get everything ASUS added to the shit list.And additionally the Server Section has Nothing todo with the Customer Section

Same company. That's good enough to get everything ASUS added to the shit list.And additionally the Server Section has Nothing todo with the Customer Section
What are you talking about? I didn't reject your complete solution, just one component of it due to the vendor having become dodgy.i find it really stupid that you provide an absolutely unrelated video without checking first the alternatives.
A lying, cheating, and stealing company is a the same no matter the product line. Asus has different support on the enterprise side, but its same same Execs and VPs running the show on all sides. I'm glad you have good working servers from Asus, meanwhile its anecdotal.And additionally the Server Section has Nothing todo with the Customer Section, completely different department/support etc.
Yes, sure! I had pointed to the difference between PVE and Ceph some posts above, here in this thread: https://forum.proxmox.com/threads/h...mox-ceph-cluster-3-4-nodes.147718/post-668241 ;-)Its worth noting exactly what may fail in this context.
I'm in Thailand with limited access to equipment. I'm not scared to build my own servers... most of my servers are hand built from Supermicro parts.. however I don't do ASUS =) I can look at SupermicroI bet this is around 30k usd/eur ?
You get for that money already 3x genoa servers, each with:
Asus rs520a-e12-rs12u
9274F + 12x 32gb ddr-5 memory (384gb) + Intel E810 + 12x micron 7450/7500 pro 4TB + 2x pm9a3 2tb for OS in m.2 110 formfactor.
I would go with 9374F and 64gb dimms, but that's out of budget for you and you have 3 servers, so 9274 is more as enough.
And you don't need a ton of memory anyway if you go with ceph and such fast nvme drives.
But however buy from dell that old crap
Cheers
Would it be more straight forward to go with Intel based servers?AMD's Cores are clustered in L3 Cache NUMA domains. These are called CCDs. For Epyc 7002 each CCD has two CCX sub domains. While Epyc 7003+ do not have sub CCX's. This is how AMD was able to reach high core counts and not break the 280w per socket power limitation (staying Eco Green), while maintaining ~3.2ghz per core under max load. Meanwhile the memory channels are unified in a central IOD, making memory uniform physically. The main problem with this is virtualization because of how vCPUs are treated as normal threads on the host. Means smaller core VMs will be limited to resources inside of those CCDs and the singural path to the IOD (96GB reads, 26GB reads) across the PCIE Infinity fabric. On top of that, memory mapping happens local to the CCD in the IOD and you dont actually get much performance above dual channel DDR4-3200/DDR5-5600 when running virtual loads. Take into the fact that a single 7002 Epyc Core is capable of pushing 9.2GB/s, you can quickly saturate out a 8core CCD with a very high compute load.
To combat this one might want to spread those vCPU threads as evenly across the socket as possible, but without creating Memory NUMA as that does create latency. Thats where these two BIOS options come in, CCX as NUMA, This puts the server into multi-numa domains based on the CCX(7002) or CCD(7003+) count, while keepig the IOD unified. The other options is MDAT=Round Robin, this reorders the CPU cores initiation table that so the first CPU on each CCX/CCD is addressed first before going back so the same CCX/CCD. This allows VMs to spread out their compute across the socket, so that they all gain benefit from the PCIE Infinity fabric pathing into the IOD, the same BW access into PCIE bus for directIO pathing, and have access to multiple L3 Cache domains.
Of course, creating multiple physical NUMA domains you have to map it up through to your VM too. The recomended VM deployment would be 1vCPU 1vSocket so that the Virtual OSE knows about the L3 Cache domains. If you need to run 16vCPUs then it would be 2vCPUs per vSocket to map the NUMA domains correctly. As such, an Epyc 7532 has 8CCDs (2c+2c) and the central IOD for 8 memory channels. The above BIOS settings will put a dual socket system with two of these CPUs into 32 NUMA domains bound by the L3 Cache topology found down to the CCX layer.
Now if you find that you do not need the hardware spread for your systems, then You do not need to do any of the above. But I have seen a 25user Finanace DB system push 140GB/s in memory while maintaining 1,500,000 IOPS due to how poorly those queries/SP's were written. But, coming off Intel where everything is monolithic in socket design and true UMA, this is knowledge everyone deploying on AMD needs to have.
No, Because even with the complexity behind AMD's Socket, Intel is still a lot slower over all. Look at benchmarks and reviews at PhoronixWould it be more straight forward to go with Intel based servers?
When buying SMC AMD motherboards, make sure you spec out H11 V2's vs H12's, the H11 V2's support 7001/7002 CPUs while the H11 V1's only support 7001. its due to a smaller SPI for the BIOS found on the V1's. H12 will support 7002/7003 CPUs. But the H11 cost is 30%-60% lower then the H12's due to memory BW support (2666 vs 3200).I'm in Thailand with limited access to equipment. I'm not scared to build my own servers... most of my servers are hand built from Supermicro parts.. however I don't do ASUS =) I can look at Supermicro
3 cases here gonna run me 3K
I can pick up on eBay: 3x 7002 + mono + Ram for around 6K
3.9TB 7450 x 48 = 32K
I'm already pushing 40K and haven't even added in NIC and other details.
I have bought off lease Dell's running nicely for 4 years now.. really depends on your source. My guy tests everything and gives me a 2 year warranty.
Thats not correct.No, Because even with the complexity behind AMD's Socket, Intel is still a lot slower over all. Look at benchmarks and reviews at Phoronix![]()
That is an awful typo due to lack of sleep and sloppy fingers ... lol mobo like motherboard.What's a "mono"?
Just fyi, the Asus consumer division and enterprise divisions are effectively different companies. Not to discount the linked behavior. At the enterprise they have a host of different issuesASUS? Just say no: https://www.youtube.com/watch?v=7pMrssIrKcY
It doesn't matter, its the same leadership on both sides of that same house https://www.asus.com/about-asus-leadership/Just fyi, the Asus consumer division and enterprise divisions are effectively different companies. Not to discount the linked behavior. At the enterprise they have a host of different issues![]()
Thats not correct.
Phoronix test suite is very optimized edge case benchmarking.
Almost everything has nothing todo with real world.
Proxmox doesn't support numa or it's completely broken with amd CPUs, as long that is the case, you cannot get 100% of the performance in any multi threading application.
I have another thread here about that, people simply don't have clue about numa on this forums, they just think that enabling numa in vm settings changes something.
However, my Genoa Servers gets constantly outperformed, especially in everything that does multi threading (because intel has an ultra fast interconnect).
Even Ryzen Consumer crap like 5800x CPU's are a huge amount faster 2-3x on multi threading apps as any Genoa/Milan/Rome.
Because they have no CCD's.
At the moment, with normal usecase for a company, like ERP-Systems etc...
Especially on Proxmox, Genoa/Milan/Rome is a lot slower as almost any Xeon in the same Generation by a factor of 2.
Cheers
Its called split tables or something like that and it is available on any genoa/milan, but you'll get usually 8 or even 16 numa domains per CPU, depending on the core count.
Each Domain per L3-Cache.
Each domain has only 8 cores.
To manage this with cpu pinning, without the support that proxmox tryed to align the cores of one VM, to a single Numa-Node, is insane.
It's not even possible of you have a cluster and migrate VM's around.
If the multi threading application inside your VM uses for example 6 cores and those aren't on the same Numa-Node, the whole L3 Cache is basically not working.
So those tasks of the application cannot share data to the other task over the L3 cache and it will go over memory, which is insanely slower.
In the end you multi threading app runs at around 33% of the speed, it could run.
So we can safely say that on AMD Milan/Epyc/Genoa platforms, if people don't pin cpu cores, every multi threading application will run around 3x slower.
On Ryzen it's a complete different story its one chiplet, even if you Ryzen has 2x L3 cache, they are shared.
On genoa/Milan the L3 cache is NOT shared across CCD's, and that's the issue.
On intel, it works somewhat different, for that i have to dig in, but none of my intel servers needed any sort of numa, except of course if they are dual/quad socket.
I think that intels interconnect between the cihplets on the cpu are simply insanely much faster as on amd side.
And earlier Intel CPUs didn't had anyway the issue, because there were Monolithic.
So as conclusion, Genoa/Milan is definitively slow with multi threading apps. Up to 3x
And there is no other way around that as Cpu pinning.
Ryzen/Intel will be 2-3x faster on multi threading apps, for every normal user.
Because no one will pin CPUs here, it's not even manageable on Clusters with migration.
You are right, i have to dig further in. I don't say there there is no way around that, just no easy one.
The easiest one is CPU Pinning, which is possible on Proxmox without a lot of knowledge and you don't need even to change bios settings for that, like enabling NPS4 or Numa per L3 Cache etc...
You just need to know which CPU's are on which CCD and that's it.
Thats something a normal user could understand.
Core topology maps and Cron Shedule pinning, is just an automatic way of that. But you would need to restart the VM for that to apply the Qemu Config, even with a Script.
But that gives you somewhat of a dynamic way around the issue. Which is actually great, i didn't thought even of that TBH.
You could make a script that looks at the assigned corecount of a VM, takes that as a "Assumed Utilization" Variable, and calculates out the best balance to PIN Cpus to what Numa-Node of each VM, if the assigned cores to a VM is not greater as NUMA-Node Core count.
I could write such a script with ease, you surely either. But the normal user cannot, a normal user usually dont even know what numa is, he knows only that if he has 2 CPU's in his Server, he needs to enable Numa xD
And thats the reason why there has to be actually a Option in Proxmox, like "Assign vCPUs to Same Numa-Node" in VM-Settings. To make that easy. Such an Option would provide every normal user a way to boost the Performance a lot. (On Epyc systems at least)
However, you have definitively more Clue as i initially thought of with Epyc Systems.
Cheers
That sounds like something that needs automating, thereby improving the quality of life of all EPYC (NUMA) users.Could you manually map the table out and build a core localization building block that KVM uses under core pinning?
Nothing official or public, most of this is of my own design that I rarely share because I do not want to openly support it.That sounds like something that needs automating, thereby improving the quality of life of all EPYC (NUMA) users.
Any ideas if the process has been documented somewhere to a reasonable depth so someone could get it done?
We use essential cookies to make this site work, and optional cookies to enhance your experience.