hugepages or anon hugepages

nomizfs

Renowned Member
Jan 7, 2015
40
7
73
Hello, i wanted to quickly ask, should i still enable hugepages by adding
Code:
hugepagesz=1G default_hugepagesz=1G
to KCL in latest PVE?

I can't find alot of information about anon hugepages, and if it's still recommended to add the above kernel command line.. Thx.
 
Last edited:
That depends on what you want.

I'm talking only about "real" hugepages, not anonymous ones:
hugepages can only be used with KVM and the VMs need to have (in your case) a multiple of 1G as RAM. You also need to enable hugepages on a per VM basis. Hugepages are faster (around 10% in my tests) for memory allocations, but you need to block the memory (configuring hugepages) before and cannot increase it on the fly if you already have memory fragmentation. This yields a very "undynamic" experience if you create and destroy a lot of VMs on the fly. Unused hugepages are NOT available to the rest of your system, so be aware of the implications. They are also non-swappable. For clusters, you need to have the settings in such a way, that a node failure can be compensated, so more wasted (=unsued) hugepages.

For my systems, I disabled them after testing. The benefit of a faster VM is not worth the hassle of having predefined memory usage that ends up not beeing used at all.
 
That depends on what you want.

I'm talking only about "real" hugepages, not anonymous ones:
hugepages can only be used with KVM and the VMs need to have (in your case) a multiple of 1G as RAM. You also need to enable hugepages on a per VM basis. Hugepages are faster (around 10% in my tests) for memory allocations, but you need to block the memory (configuring hugepages) before and cannot increase it on the fly if you already have memory fragmentation. This yields a very "undynamic" experience if you create and destroy a lot of VMs on the fly. Unused hugepages are NOT available to the rest of your system, so be aware of the implications. They are also non-swappable. For clusters, you need to have the settings in such a way, that a node failure can be compensated, so more wasted (=unsued) hugepages.

For my systems, I disabled them after testing. The benefit of a faster VM is not worth the hassle of having predefined memory usage that ends up not beeing used at all.
Thankyou for this. Based on your response i also find it not worth implementing on my setup. I try to implement all reasonable optimisations, but with a focus on usability and stability, 10% extra performance for the price of a highly 'undynamic' system is not worth it.

My case is a 2x Xeon E5-2680v4 with 128GB DDR4 RAM PVE server with 2x Nvidia Quadro P620's in pcie passthrough mode, for 2 desktop vm's.

I use PVE-helpers to pin the desktop VM cores to numa node 0, which all the pcie lanes are connected to, as well as irq affinity.

Desktop VM1 has:
Code:
cpu_taskset 1-12
assign_interrupts --sleep=10s 1-12 --all

Desktop VM2 has:
Code:
cpu_taskset 29-40
assign_interrupts --sleep=10s 29-40 --all

Each VM has:
Processors: 12 (1 sockets, 12 cores) [host] [numa=1][vcpus=12]
Memory: 24.00 GiB [balloon=0]

my KCL has: intel_iommu=on iommu=pt initcall_blacklist=sysfb_init cpufreq.default_governor=schedutil

So far everything is working fine. I have installed basic Debian + Plasma on both desktop VM's, still working on installing proprietary Nvidia drivers from the Debian non-free repo's

Any comments and/or recommendations about my setup are greatly welcome!

Edit: I'm not completely sure about my CPU pinning choices: Basically, i'm trying to emulate a 6 core, 12 thread CPU for each desktop VM. Currently, i'm pinning 12 'real' cores to desktop VM1 , and 12 'HT' cores threads to desktop VM2.

I'm thinking that maybe i should pin 1-6, 29-34 to VM1, and then 7-12, 35-40 to VM2?

Please, any recommendations on this?


Apart from the 2 desktop VM's, i plan to then pin any other normal VM's without passthrough to all cores on numa node 1, and let the kernel schedule and choose among those cores as appropriate..

my lscpu -e, numactl -H, and lstopo outputs, to help explain my cpu pinning choices:

Code:
# lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 3300.0000 1200.0000
  1    0      0    1 1:1:1:0          yes 3300.0000 1200.0000
  2    0      0    2 2:2:2:0          yes 3300.0000 1200.0000
  3    0      0    3 3:3:3:0          yes 3300.0000 1200.0000
  4    0      0    4 4:4:4:0          yes 3300.0000 1200.0000
  5    0      0    5 5:5:5:0          yes 3300.0000 1200.0000
  6    0      0    6 6:6:6:0          yes 3300.0000 1200.0000
  7    0      0    7 7:7:7:0          yes 3300.0000 1200.0000
  8    0      0    8 8:8:8:0          yes 3300.0000 1200.0000
  9    0      0    9 9:9:9:0          yes 3300.0000 1200.0000
 10    0      0   10 10:10:10:0       yes 3300.0000 1200.0000
 11    0      0   11 11:11:11:0       yes 3300.0000 1200.0000
 12    0      0   12 12:12:12:0       yes 3300.0000 1200.0000
 13    0      0   13 13:13:13:0       yes 3300.0000 1200.0000
 14    1      1   14 14:14:14:1       yes 3300.0000 1200.0000
 15    1      1   15 15:15:15:1       yes 3300.0000 1200.0000
 16    1      1   16 16:16:16:1       yes 3300.0000 1200.0000
 17    1      1   17 17:17:17:1       yes 3300.0000 1200.0000
 18    1      1   18 18:18:18:1       yes 3300.0000 1200.0000
 19    1      1   19 19:19:19:1       yes 3300.0000 1200.0000
 20    1      1   20 20:20:20:1       yes 3300.0000 1200.0000
 21    1      1   21 21:21:21:1       yes 3300.0000 1200.0000
 22    1      1   22 22:22:22:1       yes 3300.0000 1200.0000
 23    1      1   23 23:23:23:1       yes 3300.0000 1200.0000
 24    1      1   24 24:24:24:1       yes 3300.0000 1200.0000
 25    1      1   25 25:25:25:1       yes 3300.0000 1200.0000
 26    1      1   26 26:26:26:1       yes 3300.0000 1200.0000
 27    1      1   27 27:27:27:1       yes 3300.0000 1200.0000
 28    0      0    0 0:0:0:0          yes 3300.0000 1200.0000
 29    0      0    1 1:1:1:0          yes 3300.0000 1200.0000
 30    0      0    2 2:2:2:0          yes 3300.0000 1200.0000
 31    0      0    3 3:3:3:0          yes 3300.0000 1200.0000
 32    0      0    4 4:4:4:0          yes 3300.0000 1200.0000
 33    0      0    5 5:5:5:0          yes 3300.0000 1200.0000
 34    0      0    6 6:6:6:0          yes 3300.0000 1200.0000
 35    0      0    7 7:7:7:0          yes 3300.0000 1200.0000
 36    0      0    8 8:8:8:0          yes 3300.0000 1200.0000
 37    0      0    9 9:9:9:0          yes 3300.0000 1200.0000
 38    0      0   10 10:10:10:0       yes 3300.0000 1200.0000
 39    0      0   11 11:11:11:0       yes 3300.0000 1200.0000
 40    0      0   12 12:12:12:0       yes 3300.0000 1200.0000
 41    0      0   13 13:13:13:0       yes 3300.0000 1200.0000
 42    1      1   14 14:14:14:1       yes 3300.0000 1200.0000
 43    1      1   15 15:15:15:1       yes 3300.0000 1200.0000
 44    1      1   16 16:16:16:1       yes 3300.0000 1200.0000
 45    1      1   17 17:17:17:1       yes 3300.0000 1200.0000
 46    1      1   18 18:18:18:1       yes 3300.0000 1200.0000
 47    1      1   19 19:19:19:1       yes 3300.0000 1200.0000
 48    1      1   20 20:20:20:1       yes 3300.0000 1200.0000
 49    1      1   21 21:21:21:1       yes 3300.0000 1200.0000
 50    1      1   22 22:22:22:1       yes 3300.0000 1200.0000
 51    1      1   23 23:23:23:1       yes 3300.0000 1200.0000
 52    1      1   24 24:24:24:1       yes 3300.0000 1200.0000
 53    1      1   25 25:25:25:1       yes 3300.0000 1200.0000
 54    1      1   26 26:26:26:1       yes 3300.0000 1200.0000
 55    1      1   27 27:27:27:1       yes 3300.0000 1200.0000

Code:
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 28 29 30 31 32 33 34 35 36 37 38 39 40 41
node 0 size: 64339 MB
node 0 free: 58178 MB
node 1 cpus: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 42 43 44 45 46 47 48 49 50 51 52 53 54 55
node 1 size: 64467 MB
node 1 free: 59557 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

Code:
# lstopo
Machine (126GB total)
  Package L#0
    NUMANode L#0 (P#0 63GB)
    L3 L#0 (35MB)
      L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#28)
      L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
        PU L#2 (P#1)
        PU L#3 (P#29)
      L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
        PU L#4 (P#2)
        PU L#5 (P#30)
      L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
        PU L#6 (P#3)
        PU L#7 (P#31)
      L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
        PU L#8 (P#4)
        PU L#9 (P#32)
      L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
        PU L#10 (P#5)
        PU L#11 (P#33)
      L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
        PU L#12 (P#6)
        PU L#13 (P#34)
      L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
        PU L#14 (P#7)
        PU L#15 (P#35)
      L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
        PU L#16 (P#8)
        PU L#17 (P#36)
      L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
        PU L#18 (P#9)
        PU L#19 (P#37)
      L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
        PU L#20 (P#10)
        PU L#21 (P#38)
      L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
        PU L#22 (P#11)
        PU L#23 (P#39)
      L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
        PU L#24 (P#12)
        PU L#25 (P#40)
      L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
        PU L#26 (P#13)
        PU L#27 (P#41)
    HostBridge
      PCIBridge
        PCI 01:00.0 (Ethernet)
          Net "ens2f0np0"
        PCI 01:00.1 (Ethernet)
          Net "ens2f1np1"
      PCIBridge
        PCI 02:00.0 (VGA)
      PCIBridge
        PCI 03:00.0 (NVMExp)
          Block(Disk) "nvme0n1"
      PCIBridge
        PCI 04:00.0 (VGA)
      PCI 00:11.4 (SATA)
        Block(Disk) "sda"
      PCIBridge
        PCI 06:00.0 (Ethernet)
          Net "enp6s0"
      PCIBridge
        PCI 07:00.0 (Ethernet)
          Net "enp7s0"
      PCIBridge
        PCIBridge
          PCI 09:00.0 (VGA)
      PCI 00:1f.2 (SATA)
        Block(Disk) "sdd"
        Block(Disk) "sdb"
        Block(Disk) "sde"
        Block(Disk) "sdc"
  Package L#1
    NUMANode L#1 (P#1 63GB)
    L3 L#1 (35MB)
      L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
        PU L#28 (P#14)
        PU L#29 (P#42)
      L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
        PU L#30 (P#15)
        PU L#31 (P#43)
      L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
        PU L#32 (P#16)
        PU L#33 (P#44)
      L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
        PU L#34 (P#17)
        PU L#35 (P#45)
      L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
        PU L#36 (P#18)
        PU L#37 (P#46)
      L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
        PU L#38 (P#19)
        PU L#39 (P#47)
      L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
        PU L#40 (P#20)
        PU L#41 (P#48)
      L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21
        PU L#42 (P#21)
        PU L#43 (P#49)
      L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22
        PU L#44 (P#22)
        PU L#45 (P#50)
      L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23
        PU L#46 (P#23)
        PU L#47 (P#51)
      L2 L#24 (256KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24
        PU L#48 (P#24)
        PU L#49 (P#52)
      L2 L#25 (256KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25
        PU L#50 (P#25)
        PU L#51 (P#53)
      L2 L#26 (256KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26
        PU L#52 (P#26)
        PU L#53 (P#54)
      L2 L#27 (256KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27
        PU L#54 (P#27)
        PU L#55 (P#55)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!