Can't assign more than 16384 MB RAM to Windows 10 VM

tr1on1x · Sep 10, 2020

Hi there,

I am running proxmox on a AMD EPYC 7402p with 128GB ram. Mainboard is asrock ROMED8-2t
In there I have a Windows 10 Pro VM with GPU PCIE passthrough and Ethernet Card PCIE passthrough.

Everything works really great and stable and I am super happy with it.

But now I found a weird issue with this VM:
Recently I wanted to assign more RAM to the machine, and there is plenty of unused ram.
So far I tried these memory sizes:
18432 MiB
20480 MiB
24576 MiB
28672 MiB
32768 MiB

And ALL of the above values create the same error on boot and the VM won't start...funny thing: if I revert to 16384 MiB it just starts up fine:

TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -name PlexServer -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/101.pid -daemonize -smbios 'type=1,uuid=0182286f-163e-4524-809d-2af63315c053' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/zvol/rpool/data/vm-101-disk-0' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/101.vnc,password -no-hpet -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=bd3c0dbc-c772-4bfc-a9c5-4848979aef56' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:81:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' -device 'vfio-pci,host=0000:c1:00.0,id=hostpci1.0,bus=ich9-pcie-port-2,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:c1:00.1,id=hostpci1.1,bus=ich9-pcie-port-2,addr=0x0.1' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -chardev 'socket,path=/var/run/qemu-server/101.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e743e7127f1' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-101-disk-1,if=none,id=drive-scsi0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -rtc 'driftfix=slew,base=localtime' -machine 'type=q35+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout

I have no idea where to start, as technically it works great and stable with 16gb ram (also lower values work)

Is there a setting in the BIOS for EPYC that prevents more RAM virtualization?

Anyone who had this issue as well before?

Thank you very much for all your suggestions!

Best regards!

tr1on1x · Sep 10, 2020

UPDATE:
I found a temporary solution by ignoring and bypassing the timeout on boot.

So instead of launching the VMas usual with the webinterface, I am using this ssh command:

Code:

qm showcmd 101 | bash

This just starts my VM with all memory I want. It also easy survises restarts from the OS. But when I shut down I have to start via the commandline again.

Question: Is there an option to set the timeout for a VM to a high value, so that I can boot it normally again?

Or is there another solution for this obvious slow memory allocation?

dcsapak · Sep 11, 2020

what is your pveversion -v ?

did you already try to enable hugepages? this may help allocating the memory faster

apoc · Sep 11, 2020

Try parameter hugepages as indicated here:
https://forum.proxmox.com/threads/vm-start-timeout-pci-pass-through-related.28876/post-145156
HTH

tr1on1x · Sep 11, 2020

Thank you for your answers!

My pveversion -v is:

Code:

proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.2-1
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.1.0-1
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1

I did not try hugepages yet. I was reading about it before but as I had no idea what hugepages is and does, I tried to avoid it. Maybe you can give a quick description what it is / does?

I tried adding this to the config:
hugepages: 2
this results in this new error:

Code:

TASK ERROR: start failed: hugepage allocation failed at /usr/share/perl5/PVE/QemuServer/Memory.pm line 544.

Also tried:
hugepages: 1
this results in this new error:

Code:

TASK ERROR: your system doesn't support hugepages of 1 MB

My actual working config without hugepages but has to bestarted via terminal to ignore timeout is:

Code:

agent: 1
bios: ovmf
bootdisk: scsi0
cores: 8
cpu: kvm64,flags=+pdpe1gb;+hv-tlbflush
efidisk0: local-zfs:vm-101-disk-0,size=1M
hostpci0: 81:00.0,pcie=1,x-vga=on
hostpci1: c1:00,pcie=1
machine: q35
memory: 32768
name: server
numa: 1
onboot: 1
ostype: win10
scsi0: local-zfs:vm-101-disk-1,cache=writeback,size=512G
scsihw: virtio-scsi-pci
smbios1: uuid=0182286f-163e-4524-809d-2af63315c053
sockets: 1
vmgenid: 45ea7f3d-6067-456f-bd38-583128a5fe6d

Do I need to install something more for hugepages? Thanks for your help!

In my mind I would just be fine to have the machine wait longer until timeout...that would already be all I need I guess.

LnxBil · Sep 11, 2020

tr1on1x said:
I did not try hugepages yet. I was reading about it before but as I had no idea what hugepages is and does, I tried to avoid it. Maybe you can give a quick description what it is / does?

https://wiki.debian.org/Hugepages

It also describes how you setup hugepages so that you can use them for VMs.

tr1on1x · Sep 11, 2020

LnxBil said:
https://wiki.debian.org/Hugepages

It also describes how you setup hugepages so that you can use them for VMs.

Thank you! I will read myself into this

Always good to learn about new things!

Meanwhile the error when starting with 32Gb or in general more than 16gb RAM went away while I was configuring proxmox further more.

I was reading into finetuning proxmox with sysctl-proxmox-tune.conf and while I was adding and playing with the settings the error went away at one point. I still need to find out the exact responsible parameter for it, but in general the read from wendell from level1tech was super helpful:
https://forum.level1techs.com/t/how-to-get-the-most-out-of-your-new-gigabyte-epyc-server/151602

Also I paired it with this awesome very well explained and formated sysctl-proxmox-tune.conf file from Sergey Dryabzhinsky.
https://gist.github.com/sergey-dryabzhinsky/bcc1a15cb7d06f3d4606823fcc834824

These 2 posts helped a lot for understanding more about the finetuning, which in my case had some great effects. One is that the memory error is gone

I carefully put my /etc/sysctl.d/sysctl-proxmox-tune.conf together from the 2 readings above and only took in the things I found interesting or made sense for my setup:

Code:

# https://tweaked.io/guide/kernel/
# Don't migrate processes between CPU cores too often
kernel.sched_migration_cost_ns = 5000000
# Kernel >= 2.6.38 (ie Proxmox 4+)
kernel.sched_autogroup_enabled = 0

### NETWORK ###
# Don't slow network - save congestion window after idle
# https://github.com/ton31337/tools/wiki/tcp_slow_start_after_idle---tcp_no_metrics_save-performance
net.ipv4.tcp_slow_start_after_idle = 0

# max # connections
net.core.somaxconn = 512000

net.ipv6.conf.all.disable_ipv6 = 1

# https://www.serveradminblog.com/2011/02/neighbour-table-overflow-sysctl-conf-tunning/
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096

# close TIME_WAIT connections faster
net.ipv4.tcp_fin_timeout = 10
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 15

# more ephermeral ports
net.ipv4.ip_local_port_range = 10240    61000

# Keepalive optimizations
# By default, the keepalive routines wait for two hours (7200 secs) before sending the first keepalive probe,
# and then resend it every 75 seconds. If no ACK response is received for 9 consecutive times, the connection is marked as broken.
# The default values are: tcp_keepalive_time = 7200, tcp_keepalive_intvl = 75, tcp_keepalive_probes = 9
# We would decrease the default values for tcp_keepalive_* params as follow:
# Disconnect dead TCP connections after 10 minutes
net.ipv4.tcp_keepalive_time = 600
# Determines the wait time between isAlive interval probes (reduce from 75 sec to 15)
net.ipv4.tcp_keepalive_intvl = 15
# Determines the number of probes before timing out (reduce from 9 sec to 5 sec)
net.ipv4.tcp_keepalive_probes = 5

# Protection from SYN flood attack.
net.ipv4.tcp_syncookies = 1


### MEMORY ###
# try not to swap
vm.swappiness = 1

# https://major.io/2008/12/03/reducing-inode-and-dentry-caches-to-keep-oom-killer-at-bay/
vm.vfs_cache_pressure = 10000

# allow application request allocation of virtual memory
# more than real RAM size (or OpenVZ/LXC limits)
vm.overcommit_memory = 1

# time in  centi-sec. i.e. 100 points = 1 second
# delayed write of dirty data
vm.dirty_writeback_centisecs = 3000
# flush from memory old dirty data
vm.dirty_expire_centisecs = 18000

##
# Adjust vfs cache
# https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
# Decriase dirty cache to faster flush on disk
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

So I would call this solved, but I guess for other people to follow it would be great to exactly know which of the above parameters was removing the timeout memory error, my suggestion is it was theswappiness = 1 or the vm.overcommit_memory = 1.
But maybe some of the experts here can help

Anyway thanks for all your help and best regards!

Search

Search

Can't assign more than 16384 MB RAM to Windows 10 VM

tr1on1x

Member

tr1on1x

Member

dcsapak

Proxmox Staff Member

apoc

Famous Member

tr1on1x

Member

LnxBil

Distinguished Member

tr1on1x

Member