I already have an influxdb database installed on a VM that I use for other data collecting. That's not the problem.
What is the issue, though, what's the best way to install the ceph influx module. My understanding is that Proxmox "slightly" modifies ceph and distributes the ceph packages...
What you are pointing enables the Proxmox influx integration, which gathers Proxmox-related statistics. I am interested in Ceph-specific stats, that are not gathered by the Proxmox metric server.
I would like to enable influx data gathering for ceph. The ceph documentation that I found have instructions on how to enable it, but not how to install it; I can only assume that it installs by default.
However, when I try to enable the influx ceph module in a Proxmox cluster, I get...
If I remove a failed node, are the containers and VMs taken care of automatically? Is there some other process I need to do to let Proxmox know that they are not around anymore?
How about backups? What is the implication of existing backups on VM/LXCs after I re-create the node (with new HW)...
(c) and (d) helped. It turned out that some other nodes had corosync frozen, and by restarting corosync on those nodes the cluster became alive again.
I went ahead and did updates on the rest of the nodes without problems, so I am back in business now.
Thank you for the help...
restarting pve-cluster on 2 nodes (one with the update and one without) did not do any help.
Attaching journals from both nodes.
As a note, the screenshot was from the console, not any logs.
We have 2 Proxmox clusters. I updated packages on one cluster with no problems. Then I started updating packages on the 2nd cluster, and after I upgraded the 1st node, the whole cluster went berserk.
Initially, some of the nodes appeared up, some down, and some with a "?", even...
I did a package upgrade yesterday 2/6/22 (including kernel update to 5.13.19-4-pve) and it's consistently hanging about 20-30 minutes after boot. Symptoms include:
- unresponsive console, does not respond to Ctrl-Alt-Del or ACPI Shutdown.
- "qm list" hanging. Some VMs hanging. LXC...
Thanks. I couldn't find where you pointed, but I looked in the Search tab (in any view), you can sort with various columns and search, so it good enough for my use.
It's still inconsistent, though, to sort (by default) the nodes alphabetically and the VM/CTs by ID. :-)
Thanks for the pointers!
I would like to request a feature which is very useful in large clusters. We have a cluster with 33 nodes and way too many containers to count by hand (about 100).
In the Server View, nodes are sorted by hostname, while in the Pool View containers and VMs are listed by VM/CT...
I had trouble following the recipe provided, even with a reboot. I always end up with a "split brain" scenario where I can connect to the host that I changed IP address, but corosync keeps that node out of quorum.
I've ended up re-installing the node from scratch (and moving VMs to another node).
I have some LXC containers with docker running inside them. Those work well with root-on-ext4, with the required changes in the .conf files.
When I do the same on root-on-zfs, I get the following messages in the console:
overlayfs: upper fs does not support RENAME_WHITEOUT.