Why would I be happy that you spam my thread with off-topic nonsense? Why do you keep posting instead of reading the original question? How can you be so ignorant to tell me what to do after hijacking my thread?
- no one asked your advice about how to prevent reboots
- no one was blaming...
Occasionally we experience unplanned, spontaneous reboots on our Proxmox nodes installed on ZFS. The problem we are having is related to vzdump backups: if a reboot happens during an active vzdump backup that locks a VM, after reboot the locked guest will not start, and needs to be manually...
It does not give any output (the PVE storage is named the same as the RBD pool):
root@proxmox:~# pvesm list pool2
root@proxmox:~#
I have even tried to remove and re-add the RBD storage with different monitor IPs, to the same effect: Proxmox does not see the content of the RBD pools.
This took about 1 second:
root@proxmox:~# rbd ls -l pool2
NAME SIZE PARENT FMT PROT LOCK
vm-102-disk-1 43008M 2
vm-112-disk-1 102400M 2
vm-126-disk-1 6144G 2
The GUI list is empty since the spontaneous node reboot Saturday night, and neither OSD repairs...
Also here is my ceph.conf, I remember disabling the cache writethrough until flush setting (probably not connected to this issue). This is clearly a Proxmox problem, since the data is there on the rbd pools, only Proxmox does not seem to see them.
root@proxmox:~# cat /etc/pve/ceph.conf...
root@proxmox:~# rbd ls pool2
vm-102-disk-1
vm-112-disk-1
vm-126-disk-1
Isn't that interesting? The virtual disks are there, yet Proxmox does not see them.
I have since updated and restarted all cluster nodes, but the problem persists, Proxmox does not see the contents of the Ceph pools.
I have a small 5 node Ceph (hammer) test cluster. Every node runs Proxmox, a Ceph MON and 1 or 2 OSDs. There are two pools defined, one with 2 copies (pool2), and one with 3 copies of data (pool3). Ceph has a dedicated 1Gbps network. There are a few RAW disks stored on pool2 at the moment...
Are you on ZFS? Is your system low on free memory? If your answer to both questions is yes, then your spontaneous reboot can be prevented by the following:
1. ZFS ARC size
We aggressively limit the ZFS ARC size, as it has led to several spontaneous reboots in the past when left unlimited...
Yes, create the file with the two lines. Include the values in bytes, so 5GB equals 5x1024x1024x1024 bytes.
Limit the ARC and set the values for rpool/swap as soon as you can, and your server will not reset anymore. You can also upgrade Proxmox to 4.4 right now, it will work fine. You should...
1. ZFS ARC size
We aggressively limit the ZFS ARC size, as it has led to several spontaneous reboots in the past when unlimited. Basically, we add up all the memory the system uses without caches and buffers (like all the KVM maximum RAM combined), subtract that from total host RAM, and set the...
These tweaks are for the host mainly, but I'm using vm.swappiness=1 in all my VMs as well to avoid unnecessary swapping.
Just as I expected: this CPU stall issue is happening when the host is using ZFS, and IO load is high.
@e100 were you having problems / applying this solution on the host, in the guests or both?
I have come to apply the same solution to similar set of problems as well. Unfortunately I have no idea why it helps, but IO buffers and KVM memory allocation together with the heavy memory use of ZFS in...
I think it's fine. I have a couple of other sysctl tweaks that are all supposed to decrease IO load due to better use of memory, thus prevent stalls and OOM kills:
vm.min_free_kbytes = 262144
vm.dirty_background_ratio = 5
vm.swappiness = 1
This is about why dirty_background_ratio needs to be...
Thank you for this report. In our case, almost all of our VMs were created under 4.3 (and their contents synced from OpenVZ), so I doubt that the version of the node that created the VM would have any effect.
One possible solution
On the other hand, we have found one thing that probably helps...
We already did test it on Proxmox, but I look forward to seeing your benchmarks.
They do recommend deadline, but not for VMs (Debian 8 disables the IO scheduler inside VMs and on NVMe SSDs). Also they are not talking about HW RAID cards, for which noop is the only sensible choice.
Yes, because SSDs are usually much faster than what the schedulers were designed for.
The above means you have the deadline scheduler set for your disk. You will be better off with the noop scheduler when inside a VM.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.