Hi,
i have some issues with my cluster. This morning it was totally RIP due to a hardware failure of two nodes.
It took some time to get everything sorted.
Now, everything is green (27 nodes) except 4 nodes. They are grey.
pvecm status
looks fine.
When I click on "sumary", everything looks...
today a node hung up again and the whole cluster was red. A tec fixed the node too fast to check the syslog or pvecm.
Here the output of the corosync.conf:
When I try omping, I get:
Any ideas?
happens like once every 1-3 months. We have a big cluster with 40 nodes now, and there are different reasons why this happens. Broken disk, failing network card, etc. So isn't there a way to find out the failing node without going trough all nodes?
Hi,
from time to time our whole cluster wents grey and "df -h" stucks right before /etc/pve.
It takes us hours to figure out what node is causing it. Mostly by connecting to every node and stop corosync / pve-cluster there.
Is there a way to figure out the hanging node trough a log file?
still no idea whats going on here. These qmp timeout essages are the issue I think, but no clue why they occur. There are some posts in the web where people have the same issues:
https://www.reddit.com/r/Proxmox/comments/am3168/windows_10_vm_fails_to_start_got_timeout/...
when they freeze, they look like this:
http://prntscr.com/ne7eu3
Host looks fine:
http://prntscr.com/ne7f1l
When I stop this VM now and start again, I get scope unit error.
Btw:
This happens if Guest Agent is enabled in proxmox, but the guest Agent is not running inside the VM (e.g. still...
This is why I am here ;-) Because the VMs keeps freezing and I have no clue why. If that happen, console (VNC) does not work, guest agent does not work anymore, the VM is not accessable by RDP/SSH. Then I try to STOP and START the server and the scope unit message occurs.
What do you mean with...
Retransmit List caused by three nodes I rebooted with ksmsharing disabled.
Yes, guest-agent is installed. When the VM stucks, it is not running anymore.
Our panel does a qemu agent ping, if it succeeds we trigger a "shutdown", if not, we trigger a "stop" to proxmox API.
we are facing that issue for months now. But it became very heavy since our cluster grows (currently we are moving all VMs from SolusVM to proxmox, but stopped that process due to the issues).
I rebooted one Node with only 5 VMs. Disabled ksmtuning before, because of strange ballooning issues...
global storage .cfg:
Node Bondsir003:
VM config:
This is only one node and VM.
Currently I can reproduce the issue on 8 different nodes and on 20 VMs total.
LXC containers are not affected btw.
these "stucky vms" dont response to guest tools and do have a memory usage of >90%.
Last night I had running some test servers with absolutly no operations or load. Some of them do have the same issue this morning.
Some of them had windows installed, some linux.
we use a different storage (just a SSD with ext4 as directory storage) on each node. Storages looking fine. Issue is happening on ALL nodes by chance. It also happens on nodes with 5 VMs and on nodes with 30 VMs. So it is no overprovisioning.
VMs are stored on VirtIO SCSI / qcow2.
Any idea on...
Hi,
we have a big problem with our Proxmox cluster. The cluster consists of 25 nodes with 10-50 servers each (LXC & KVM).
It happens 20-30 times a day that KVM servers freeze. The console is then not reachable and the server itself is also not.
When we stop the server, Proxmox displays...
Hi,
we have a cluster of 20 nodes. VNC is working fine on all nodes, except on one.
When we try to access the "console" of any VM on that problematical node FROM a different node, we keep getting "Failed to connect to server".
But when we open the GUI of the problematical node and try to open...
I figured out, that mariadb does not work if I have
at the end of the specific container config. Once I remove them, mariadb works.
But those lines are needed to avoid the Docker container "permission denied" start error.
Anyone knows how to make a LXC Debian 9 container work for docker and...
Hi,
when I try to install MariaDB on a Debian 9 LXC container (template downloaded from proxmox) I keep getting:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2 "No such file or directory")
Check this:
https://prnt.sc/n8d21e...
nvm, I can confirm using this image:
https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
with the tutorial from proxmox:
https://pve.proxmox.com/wiki/Cloud-Init_Support "Preparing Cloud-Init Templates"
doing this step "# finally attach the new disk to the VM as scsi...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.