I have a cluster that has relatively heavy IO and consequently free space on the ceph storage is constantly constrained. I'm finding myself performing fstrim on a more and more frequent interval. Is there a way to auto trim a disk for an lxc container?
I am encountering a problem on busy servers were the nodes "inexplicably" lost connectivity with cluster partners and fence themselves off. Some investigation shows that when this happens, pve-firewall is enabled and conntrack table is full.
a quick look at "virgin" iptables rules has entries...
Just noticed this on one of my clusters; disk resize is failing with the following error message:
Resizing image: 100% complete...done.
mount.nfs: Failed to resolve server rbd: Name or service not known
Failed to update the container's filesystem: command 'unshare -m -- sh -c 'mount...
I have recently began deployment of nested containers following an orderly upgrade to 5.3, when I noticed that only a root user may actually flag the features. What is the rationale for this limitation? What are the implications of setting these flags I'm not considering?
I have this intermittent problem with storage returning 0 values for a specific rbd pool. Its only happening on one cluster, and there doesnt seem to be a corrolation to which node context is being called...
I have a need to have a specific container access NFS mounts present on the hypervisor (they're attached via IB which is not bridgeable to the container.)
I can mount them as individual bind mounts and that works, but as I need to attach approx 20 of them, it will exceed the 10 mount limit...
I have an odd problem with one of my clusters. all the sudden all but one of the nodes turned grey in the GUI:
Whats odd is that pvecm status shows normal; pvestatd and pveproxy are functioning normally on all nodes, there are no hung mounts, pct/qm commands process without interruption, and...
Within the last month or so pvesh has changed from outputting json to "pretty" output by default. I'm trying to obtain json output again but cant figure out how; pvesh refers to format options:
USAGE: pvesh set <api_path> [OPTIONS] [FORMAT_OPTIONS]
Call API PUT on <api_path>...
I have a (what appears to be) intermittent problem with container shutdowns taking a LONG time. For example:
As you can see, there is a NEARLY 7 MINUTE delay from the stop request end time to the shutdown command. What is the cause of this delay and how can it be mitigated?
I have a new problem (well, it could be old and I just noticed it.) I have a number of containers that show any number of snapshots but when I look at the disk those snapshots dont exist.
pvesh get /nodes/sky12/lxc/16980/snapshot
"description" : "Automatic snapshot...
Every once in a while I have a node that no longer responds to api calls; its usually because a container is not responding to pvestatd and has to be killed in order to release pveproxy.
How can I have my monitoring identify this condition? since both pveproxy and pvestatd are technically...
I have a node that was shut down properly but will no longer boot. It will be a while until I can get to it to fix but in the meantime, how do I take over its assets (VMs and CTs) that are now showing as ? status in the defunct node?
I have a cluster of 6 nodes, each containing 8x Intel SSDSC2BB016T7R for a total of 48 OSDs. each node has 384GB ram and 40 logical cpus. For some reason, this cluster performance is really low in comparison to other deployments. deploying the gitlab template took well over 5 minutes...
Today I had an interesting failure. I have a node that was misbehaving, and corosync was not able to synchronize (was getting [TOTEM ] Received message has invalid digest... ignoring.)
I proceeded to move containers off of it but part way through the process the node crashed. Here is the first...
I have a situation where a snapshot reversion for a container (rbd backed) is failing with the error "unable to read tail (got 0 byte)" in the tasklog. doing a manual reversion using rbd snap works fine.
proxmox-ve: 5.1-42 (running kernel: 4.15.3-1-pve)
pve-manager: 5.1-46 (running version...
I am having a hard time adding SOME nodes to an existing cluster. all nodes are freshly updated and running same version of proxmox-ve, as follows:
# pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.15.15-1-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
While running backups, vzdump got stuck on a specific container; there is no outward indication of fault but the task isnt moving, and the syslog is getting spammed with
rbd: rbd54: write 1000 at 0 result -30
the vzdump processes are not in a D state so all appears normal, but the filesize of...
Is there a way to designate a cluster node as a non compute node which will disallow it to accept QM/CT? I know this can be sort-of accomplished by using HA groups but that requires tight adherence to ha-group management; I'm looking to make the behavior default on ceph OSD nodes, etc.
I have a container that has failed to start, and hung pveproxy denying any new activity to the node, which is showing with a grey question mark in the gui. Pveproxy itself appears to be running:
# service pveproxy status
● pveproxy.service - PVE API Proxy Server
I have a weird problem with specific containers failing to snapshot; the end result is that scheduled vzdump tasks effectively fail and disallow any further vzdump jobs until manually killed.
The thing is, if I MANUALLY take a snapshot using rbd snap create it works fine- its only via api that...