no D state processes, expect sometimes ceph process has D state for some seconds.
Our cluster has 32 nodes.
Most of them are HP BL460c g8 blades with newest BIOS from 05/2018 (I31) and newest Raid Controller firmware (also tested in HBA mode to bypass the raid controller, no luck). 2x E5-2650v2...
the moment, a VM is freezing http://prntscr.com/nykjl6 (see 21:18 - 21:42) all other VMs show high disk usage:
http://prntscr.com/nykk4d
http://prntscr.com/nykk8g
What does the Disk I/O Graph on a VM means? Is it related to the VM only or to the overall disk?
These are VMs used by customers as well as test VMs without any custom services running. All KVM VMs are affected, Linux and windows. We are facing this on all nodes in our cluster using CEPH RBD storage or directory storage. LXC containers on the same hosts are not affected.
Update: After a while, without killing the VM and wait until the process disappears (takes 5-15mins), the VM starts working again:
http://prntscr.com/nyco0x
Scope is running when the VM freezes:
Also please take a look into the resource usage of a freezing VM:
http://prntscr.com/nychsb
CPU jumps to 15-20%, memory jumps to 78% and network jumps to 0%
we are now using a CEPH Cluster (RBD) with raw VMs. Unfortunately the issue still persists.
Windows VMs keep freezing after a while, CPU usage looks like this:
http://prntscr.com/ny3jkp (this is where they freeze), stopping the VM and starting again leads to:
There is still a qemu process of...
Hi,
I setup CephFS trough proxmox GUI successfully.
Everything looks good but I can't allow disk images on that storage:
http://prntscr.com/nv9v87
Isn't it possible to store VMs on CephFS?
played arround with it some hours and I successfully moved 6 nodes to an USB drive now with these steps:
- check how much disk space is used by root partition. My USB tick has 32gb, so I made sure root partition used less than 20gb.
- start rescue (I use sysrcd)
- vgrename pve pve-old ## rename...
nvm
https://pve-ffm.zap-hosting.com:8006/pve-docs/chapter-pveceph.html#chapter_pveceph
ceph fs rm cephfs --yes-i-really-mean-it
+ removing the two pools via GUI helped
Hi,
I am currently playing around with CEPH. Configured it via proxmox gui, added some osds, added a Metadata Server and finally created "CephFS" in the "CephFS" tab:
http://prntscr.com/nuzcyd
Unter "Storages" (proxmox cluster) I accidently deleted the CephFS storage. When I try to create it...
Hi,
we have a Cluster with some nodes. All nodes have a 2TB local disk where proxmox is installed (with its LV). The other space is allocated as ext4 directory storage. I would like to move the OS to a new 60 gb ssd without losing any VPS information. Means the vps stay on their ext4 directory...
Hi,
we are facing issues on all KVM VMs (qcow2, VirtIO SCSI, DIR storage (ext4)) for months now.
These VMs freeze suddenly. Means we are unable to access by RDP or even by Console (VNC).
Syslog shows
after crash.
Stopping the frozen VM and trying to start them again lead to:
There is still a...
Hi,
we have one node in our cluster what becomes "red" every 1-3 days. Can't open "summary" or "system" then. We see a 100% pmxcfs process and need to restart the whole node to fix the issue temporarily.
Syslog:
Any ideas?
no
not yet, setting this up with a dedicated internal 10g sfp+ network today.
We don't see any retransmits or high latencies, so I don't think that causes the issue
less than 1ms
Day by day more Nodes have that issue. Currently 18 nodes are affected :(
All nodes with that issue do have a...
this is the output of systemctl status pvedaemon:
seems like it stucks at "lxc-info -n 1199234 -p"
Restarting pvedaemon helps for some time, then in occurs again.
Some lxc containers are running on that node. Trying to start LXC server what are stopped lead to:
Rebooting the node helps...
The problem has worsened. Currently 6 nodes are affected.
I see:
on all affected nodes.
and also
Restarting pvedaemon fixes the problem for 5-15mins, then it starts again, node by node until ~6 nodes are affected.
Also after restarting pvedaemon, node remains "grey" in node list. "System"...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.