I have a cluster of around 20 nodes, Proxmox 8.0.4 +ceph (on 4 of them)
all have the same configuration of chronyd. and most of the time everything works.
recently (in the past month) i started to have issues with time sync. usually it comes after a power failure, some servers are out of...
flow:
1 servers had reboot due to power maintenance,
2 (after the reboot) i noticed one server had bad clock sync - fixing the issue and another reboot solved it)
the
3. after time sync fixed cluster started to load and rebalance,
4 it hang at error state (data looks ok and everything stable and...
i am trying to reboot lxc host as part of an ansible script , however i cannot make it work using ansible,
default reboot module did not work: https://docs.ansible.com/ansible/latest/collections/ansible/builtin/reboot_module.html
it printed Socket exception: Connection reset by peer (104) and...
i have stability issues on nodes with high cpu load and i would move back the to kernel was on 7.4.
what is the best approach ?
i am on pve 8.0.4 kernel 6.2.16-14
I use around 80% of the nodes in the cluster as a compute grid,
i can freely disable the mitigation on them and gain some performance, what is the best approach?
can it be done to a group of nodes (like based on name pattern) or it must be done one by one in grub ?
do i need to do something...
i have an existing cluster , and i want to add many more nodes (will store grid executor nodes on in)
can i exclude those nodes to not be a part of the quorum in such case:
node can join an active quorum, but will not be a part of the voting, (not needed to create the quorum)
does something...
I am trying to upgrade to proxmox 8.
after finish updating all nodes to 7.4-16, (and rebooted each node after install)
and updating ceph from Pacific to Quincy
i just noticed that in the ceph Performance tab i dont see traffic (i usually have around 300-6000MBS) with 1000+ IOPS
systems are...
i am trying to upgrade to proxmox 8.
after finish updating all nodes to 7.4-16, (and rebooted each node after install)
and updating ceph from Pacific to Quincy
update when smooth without issues,
after the update this issue occurred
chrony is setup and running on all nodes across the...
Can the PBS backup network storage?
for example i have proxmox cluster with ceph(proxmox based) and some other network storage mostly NFS based (external to proxmox)
can PBS be used to back network storage?
proxmox 7.4
pbs 2.4-1
i have cluster of around 20 hosts, with dally schedule for backup of many vm\lxc to pbs
i had an external (not proxmox) storage crash, that many vms\lxc had nfs mount to it.
the backup caused most of the servers gui to hang, leaving most many lxc\vm in locked )
only...
What is the best approach? that will be easy to install and maintain for future upgrades
currently on pve 7.4 and ceph 16.2.11
now have 4 vms (as testing,poc) but planning to grow to around 50
( i prefer to do it once, then clone the node if it is possible)
for perspective i have the...
I'm reaching here again insights and recommendations as I am currently in the process of reassessing our compute grid system. We've been relying on Hadoop/Spark/Yarn in isolated environments without the need for additional security measures. Unfortunately, they are no longer free and the cost...
i am trying to play around with kubernetes based on the following tutorial
https://towardsdatascience.com/deploy-a-production-ready-on-premise-kubernetes-cluster-36a5d62a2109
i received the following error as part of the installation
An exception occurred during task execution. To see the...
proxmox 7.1-8
yesterday i executed a large delete operation on the ceph-fs pool (around 2 TB of data)
the operation ended withing few seconds successful (without any noticeable errors).
and then the following problem occurred:
7 out of 32 osds went to down and out.
trying to set them in and...
I know this is a bit early and the version is not final,
but i would like to start integrate our system and migration from older Ubuntu to this one,
i have a working Ubuntu 20.04 based on standard 20.04 template, but the upgrade did not work.
any idea what would be the best practice ?
just bought some UPS to protect against power failure and electrical spikes.
the UPS support powerwalker, any tips or best practice how to integrate it ?
the ups should have enough capacity to maintain the servers under full load for around 10 minuts and low load for at least double
i am...
Our system was stable in the last few months, but after upgrade 7 to 7.1-8 we have 3-4 random crash every day
(we had issue two months ago with corosync stability but after replacing the switch the cluster worked very well under high load. without any issue)
This Monday i took advantage of...
For some reason most of the cluster is crashed (servers rebooted) it became stable after the reboot but there was a small downtime .
i tried to find the reason in the loges but i could not understand what caused it
here are the logs of the cluster from one of the nodes that was not rebooted (on...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.