We currently have 20 nodes in production with no swap.
The nodes range from 32GB RAM to 256GB RAM with the majority of them having 128GB RAM.
Most of the VMs are configured NUMA aware. I usually do not set it if the VM uses little RAM and very few cores.
I cannot recall ever having a VM get...
I have wrestled with this problem for years and never found a great solution, most of my nodes are NUMA too.
Changing swappiness never prevented it.
Any process that is idle will end up having its RAM swapped to disk if the kernel thinks that the RAM would be better used for buffer/cache.
While running swapoff on a couple of nodes the swapoff task would hang, unable to turn swap off on zram devices. They would hang generating task hung messages.
I believe these systems are still running.
Could we get any diagnostic data from these systems that might help discover the source of...
Server has 128GB RAM, Virtual servers all combined are assigned just under 60GB.
We have zfs_arc_max set to 20GB
We have not had any issues since turning off zram on the 15th.
It needs to run stable for at least a month to have confidence that turning off zram fixed anything.
I am considering...
If I am not mistaken zfs module was upgraded recently and I have already run zpool upgrade.
I do not think it would be OK to boot up kernel with older zfs module, right?
I went digging in the logs, these are attached as text files.
All of these occurred when we had zfs swap and zram enabled...
Hello again everyone, been too long since my last post here.
I have one server randomly locking up for over a month now, now a 2nd server is also having this problem.
Unfortunately I've not captured all of the kernel messages that would help diagnose this but I have a couple screenshots from...
I'm curious to know if this helps you or not: https://forum.proxmox.com/threads/increase-performance-with-sched_autogroup_enabled-0.41729/
For me it made a huge difference in IO performance on numerous servers.
Changing sched_autogroup_enabled from 1 to 0 makes a HUGE difference in performance on busy Proxmox hosts
Also helps to modify sched_migration_cost_ns
I've tested this on Proxmox 4.x and 5.x:
echo 5000000 > /proc/sys/kernel/sched_migration_cost_ns
echo 0 >...
What is the IO Wait on the Proxmox host during the backup?
If its high then you are starving the VM of disk IO causing the VM to think its disks are bad because they are not responding.
I've only started using ZFS a few months ago so I am far from an expert.
It seems that ZFS has its own...
Not enough contiguous free RAM to allocate the RAM requested.
This will display how many contiguous allocations of each 'order' are available:
From left to right each column represents the count of allocations available for each order starting with order 0.
The size of each...
I've got DRBD setup on some 5.x servers using a setup similar to the old wiki article.
@fwf DRBD will end up diskless on reboot when it cannot find the disk you specified in the configuration.
How did you reference the disks in drbd config?
I've found that using /dev/sdX is a bad idea because...
This is reported upstream already by someone else, I added my info there too.
I setup DRBD on top of a ZVOL.
When making heavy sequential writes on the primary, the secondary node throws a General Protection Fault error from zfs.
The IO was from a...
I just ran into this problem myself
Installing pve-kernel-4.13.8-3-pve_4.13.8-30_amd64.deb from pve-test seems to have resolved the issue.
I would be happy to give Proxmox one of these cards to put into the test servers.
Would you like me to ship it to you?