Proxmox 6.1 Memory Management - Booting from ZFS

Mar 4, 2020
5
0
6
52
Hi, I have a bit of a problem with a test system I am running.

Version details are below but the key factors are that I am booting from a single conventional hard disk, setup as a ZFS pool (called rpool). My reason for using zfs is that I plan to replicate the partition so I have a complete backup.

The 'problem' is that I have only 8GB RAM in the host. So when I migrated (live) a VM onto this host, it didn't have enough RAM to support the additional VM. It couldn't use its swap file, because being a clean ZFS build, it didn't have one!

root@pve1:~# free -h
total used free shared buff/cache available
Mem: 7.7Gi 4.5Gi 129Mi 71Mi 3.2Gi 3.0Gi
Swap: 0B 0B 0B
root@pve1:~#

So I got this

Mar 04 16:12:24 pve1 kernel: Out of memory: Killed process 11295 (kvm) total-vm:4906856kB, anon-rss:4227176kB, file-rss:1560kB, shmem-rss:4kB
Mar 04 16:12:24 pve1 kernel: oom_reaper: reaped process 11295 (kvm), now anon-rss:0kB, file-rss:36kB, shmem-rss:4kB
Mar 04 16:12:24 pve1 systemd[1]: 104.scope: Succeeded.
Mar 04 16:12:24 pve1 qmeventd[772]: Starting cleanup for 104

i.e - It shutdown one of the VMs already running on the host. And it wasn't the one I just started either!

Obviously, this is going to be a problem in a production scenario.

Can anyone offer any advice?

Many thanks
Tom

VERSIONS
proxmox-ve: 6.1-2 (running kernel: 5.3.10-1-pve) pve-manager: 6.1-3 (running version: 6.1-3/37248ce6) pve-kernel-5.3: 6.0-12 pve-kernel-helper: 6.0-12 pve-kernel-5.3.10-1-pve: 5.3.10-1 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.0.2-pve4 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: not correctly installed ifupdown2: 1.2.5-1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.13-pve1


SYSLOG
Mar 04 16:12:16 pve1 pvedaemon[13362]: start VM 105: UPID:pve1:00003432:0007BE93:5E5FD360:qmstart:105:root@pam:
Mar 04 16:12:16 pve1 pvedaemon[1753]: <root@pam> starting task UPID:pve1:00003432:0007BE93:5E5FD360:qmstart:105:root@pam:
Mar 04 16:12:16 pve1 systemd[1]: Started 105.scope.
Mar 04 16:12:16 pve1 systemd-udevd[13417]: Using default interface naming scheme 'v240'.
Mar 04 16:12:16 pve1 systemd-udevd[13417]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Mar 04 16:12:16 pve1 systemd-udevd[13417]: Could not generate persistent MAC address for tap105i0: No such file or directory
Mar 04 16:12:16 pve1 kernel: device tap105i0 entered promiscuous mode
Mar 04 16:12:16 pve1 ovs-vsctl[13430]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap105i0
Mar 04 16:12:16 pve1 ovs-vsctl[13430]: ovs|00002|db_ctl_base|ERR|no port named tap105i0
Mar 04 16:12:16 pve1 ovs-vsctl[13431]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln105i0
Mar 04 16:12:16 pve1 ovs-vsctl[13431]: ovs|00002|db_ctl_base|ERR|no port named fwln105i0
Mar 04 16:12:16 pve1 ovs-vsctl[13432]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-port vmbr0 tap105i0 tag=1201
Mar 04 16:12:16 pve1 pvedaemon[1753]: <root@pam> end task UPID:pve1:00003432:0007BE93:5E5FD360:qmstart:105:root@pam: OK
Mar 04 16:12:21 pve1 pvedaemon[7663]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - got timeout
Mar 04 16:12:24 pve1 kernel: kvm invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
Mar 04 16:12:24 pve1 kernel: CPU: 1 PID: 13446 Comm: kvm Tainted: P O 5.3.10-1-pve #1
Mar 04 16:12:24 pve1 kernel: Hardware name: Dell Inc. PowerEdge R210 II/03X6X0, BIOS 2.7.0 11/15/2013

Mar 04 16:12:24 pve1 kernel: 0 pages hwpoisoned
Mar 04 16:12:24 pve1 kernel: Tasks state (memory values in pages):

Mar 04 16:12:24 pve1 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/qemu.slice/104.scope,task=kvm,pid=11295,uid=0
Mar 04 16:12:24 pve1 kernel: Out of memory: Killed process 11295 (kvm) total-vm:4906856kB, anon-rss:4227176kB, file-rss:1560kB, shmem-rss:4kB
Mar 04 16:12:24 pve1 kernel: oom_reaper: reaped process 11295 (kvm), now anon-rss:0kB, file-rss:36kB, shmem-rss:4kB
Mar 04 16:12:24 pve1 systemd[1]: 104.scope: Succeeded.
Mar 04 16:12:24 pve1 qmeventd[772]: Starting cleanup for 104
Mar 04 16:12:24 pve1 ovs-vsctl[13543]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln104i0
Mar 04 16:12:24 pve1 ovs-vsctl[13543]: ovs|00002|db_ctl_base|ERR|no port named fwln104i0
Mar 04 16:12:24 pve1 ovs-vsctl[13544]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap104i0
Mar 04 16:12:25 pve1 qmeventd[772]: Finished cleanup for 104
 
Obviously, this is going to be a problem in a production scenario.
In a production scenario you should consider adding some serious amount of memory.
8GB memory is just not enough these days.
You can limit ZFS memory usage but in the very end, if you are honest, it is just not enough RAM. Get more RAM. Everything else is likely getting you into more trouble down the line ...
 
Thanks tburger, I agree 8GB is not enough. The problem though, is that there is no such thing as enough!

We run medium sized hosting enviroment on VMWare at the moment and I am looking to move it all over to Proxmox. With VMWare though, the performance just drops off as you over-contend the RAM and you get plenty of time to spot that and take action. It looks like with PVE, we will just get someone's VM(s) shutdown without warning if we are silly enough to add too many VMs for the available RAM. We are that silly sometimes, thats life!

To use a motoring analogy, its rather like getting into a taxi to find there are no seatbelts and having the driver tell you its fine because he will drive slowly! Not everything is down to the driver in the real world!

To be fair, we have never pushed VMWare very hard, so it may not behave that brilliantly but thats not really the point here. I would like a 'soft fail' if I can manage one.

Is zram a useful option? We could perhaps monitor its use and that could be a warning that things were getting tight.

Any opinions?

Many thanks for the help. Its much appreciated.

Tom
 
Brutal honest opinion? Here you go:

If you are doing Memory oversubscription you are playing with fire.
ESXi has plenty of techniques to run oversubscription, even those you would NEVER want to happen (for instance running into VMkernel swap all the time). ESXi lets people do a lot of bad (not to use the word stupid) things.

KVM is not "just another ESXi". It works different. And this is a GOOD thing.
If Linux runs out of memory it will kill processes to protect itself from running into an issue (e.g. a kernel panic).
BTW: So does Windows and eventually so does ESXi (have never tried to push it that far).

But that only describes an issue that someone (the user) has pushed the system too far.
We can argue what is enough? 8GB isn't (anymore). Full stop. Even for an ESXi. 8GB on an hypervisor (ESXi) was running out in my private environment roughly 8 years ago. Memory is dirt cheap. Get some memory if you are serious.

To me it seems you need some resource management in your hosting platform. Get some gear that can do the job. Everything else is band aid.

Not meant as an offense ;).

PS: You can limit ZFS RAM usage with the following parameters, but I would be very careful as ZFS needs some memory. Depending on what you are doing it needs a lot more. I reserve 32GB out of 128 GB memory in my homeserver for ZFS...

Code:
#Limit ZFS Memory and RAM usage
sudo vi /etc/modprobe.d/zfs.conf

### START FILE CONTENT >>>>>
# This is for 32 GB / 32768 MB
#options zfs zfs_arc_min=34359738368
#options zfs zfs_arc_max=34359738368
# This is for 16 GB / 16384 MB
#options zfs zfs_arc_min=17179869184
#options zfs zfs_arc_max=17179869184
# This is for 8 GB / 8192 MB
#options zfs zfs_arc_min=8589934592
#options zfs zfs_arc_max=8589934592
# This is for 4 GB / 4096 MB
#options zfs zfs_arc_min=4294967296
#options zfs zfs_arc_max=4294967296
# This is for 2 GB / 2048 MB
#options zfs zfs_arc_min=2147483648
#options zfs zfs_arc_max=2147483648
# This is for 1 GB / 1024 MB
#options zfs zfs_arc_min=1073741824
#options zfs zfs_arc_max=1073741824
# This is for 512 MB
#options zfs zfs_arc_min=536870912
#options zfs zfs_arc_max=536870912
# This is for 256 MB
#options zfs zfs_arc_min=268435456
#options zfs zfs_arc_max=268435456
###END FILE CONTENT >>>>>
 
  • Like
Reactions: dasfubarden
Thanks tburger, I see what you are saying. And I agree that memory overcontention can be a problem.

I will try that ZFS tweak as well. It was on my list, so you have saved me some time.

When you mention 'resource management' do you have a particular package in mind?

All best
Tom
 
When I am talking about "resource management" I have no specific package in mind.
I think more generic in terms of you / your team take care you are not overwhelming the gear you are using.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!