i can do (i have backup or able to regenerate everything , but it will take time around a week of work).
but will it fix the cluster? because everything is marked gray, all lxc\vms are down.
perhaps upgrade to latest version might help? 7.1-8 - 7.2 , but it minor dont think it will change...
this is one of the first things i did. i also rebooted all the servers in the cluster.
what to do from the link. the delete is not looks as a safe option
deleted and recreated them. because they fail to start.
how i can fix the PG warning? (i have backup for everything) but it don know what is deleted\corrupted.
what can cause the cluster instability? all nodes are appear grayed out.
proxmox 7.1-8
yesterday i executed a large delete operation on the ceph-fs pool (around 2 TB of data)
the operation ended withing few seconds successful (without any noticeable errors).
and then the following problem occurred:
7 out of 32 osds went to down and out.
trying to set them in and...
I have the same problem,
lowest at 50% and highest at 89.
running the command "ceph osd reweight-by-utilization" initiate some re balancing, running it few more times until it looks better.
can it be automated ?
do-release-upgrade -d .
and do-release-upgrade -d -c returns:
Checking for a new Ubuntu release
New release '22.04' available.
Run 'do-release-upgrade' to upgrade to it.
from 20.04 i did upgrade and this is the output:
Reading cache
Checking package manager
Reading package lists... Done
Building dependency tree
Reading state information... Done
Hit http://archive.ubuntu.com/ubuntu focal InRelease
Hit...
I know this is a bit early and the version is not final,
but i would like to start integrate our system and migration from older Ubuntu to this one,
i have a working Ubuntu 20.04 based on standard 20.04 template, but the upgrade did not work.
any idea what would be the best practice ?
just bought some UPS to protect against power failure and electrical spikes.
the UPS support powerwalker, any tips or best practice how to integrate it ?
the ups should have enough capacity to maintain the servers under full load for around 10 minuts and low load for at least double
i am...
sure, ill post it,
to be clear you need
journalctl -u corosync -u pve-cluster --since "XXXXX" --until "YYYYYYY" > log_$(hostname)
for all servers ? do i need to add anything else?
I have tried to change the switch and cable and i could not find any improvement, around once an hour usually at any round hour and 50 minuts (01:50 02:50 .. etch)
Dec 26 04:25:57 pve-blade-102 corosync[2238]: [KNET ] link: host: 1 link: 0 is down
Dec 26 04:50:56 pve-blade-102...
How i can enable debugging for logs?
I think the problematic host is pve-srv-102, ill try to inspect the network cable and card on Sunday, and replace them,
looks like it, host1 all the time.
the logs you asked for yesterday : journalctl -u corosync -u pve-cluster --since yesterday >/mnt/pve/nfs_home/pve_logs/log_$(hostname)
i see that host 1 have errors and sometimes it recovers and when it not the cluster crash initiated
host1 had issues in...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.