Thanks a lot for your feedback. I'll keep the cluster in it's current mixed state (seems to be running smoothly) and wait for the patched kernel through the repositories.
Hauke
Hi there,
after using the patched kernel on both affected nodes, they both behave perfectly fine, again :) Thanks a lot!
One question remains: What do I do with my funny cluster now?
Upgrade the remaining two nodes (still on 7.4) and use the patched kernel
Upgrade the remaining two nodes...
Just a quick reply while still testing: With your kernel, I can not reproduce the error using fio on a single node :)
Next steps: I will install the same kernel on the second node that is affected and will bulk-migrate vms between those two nodes (which always triggered the unwanted aborted...
Hi there,
I am in the middle of performing an upgrade of my 4 node cluster from latest 7.4 to current 8.1. While the update procedure itself worked like a charm, I am running into RAID trouble with the current kernel (same as here https://bugzilla.kernel.org/show_bug.cgi?id=217599#c30, really...
Hi Cookiefamily,
I would expect a very simple config like:
frontend pbs
mode http
bind :8007 ssl crt /etc/haproxy/my.crt alpn h2,http/1.1
default_backend servers
backend servers
mode http
server s1 y.y.y.y:8007 ssl alpn h2,http/1.1
,correct?
When I do so, the pbsclient's...
Hi ph0x,
For me ... simply isolation. I do not want my vms to have a route into the network, where the backup server resides. So I decided to run a reverse proxy on each of my cluster nodes and provide an isolated ovs-bridge for my vms to access it.
|vm| <-> |storage transfer-network| <->...
Hi Sinos,
Thanks for your reply and your help.
That alone was not yet the solution, I still get broken pipes. But further reading seems to show that nginx still does not support http2-reverse-proxying. I also gave it a try with the grpc-module, but with more or less the same results. Since...
Hi there,
we are just integrating PBS 1.1 as our primary backup solution (love it!). We would like to make it avaliable to backup-clients via an nginx reverse proxy. The proxy works perfectly fine with the Web-UI, but the backup clients fail with broken pipes:
Starting backup...
Hi,
no, I tcpdump-ed myself through it and then searched online. I found an old thread here, which guided me into the right direction.
Kind regards,
Hauke
Hello there,
this post is more for information puropses, should someone else stumble over the same problem.
We just upgraded our 5.4 environment to 6.0. To provide internal connectivity for some of our VMs, we have another ovs (vmbrX, with no physical device connected) on each node, all of...
Short update: Did Upgrade to PVE6 (hoping for a newer version of ixgbe driver) and limited the vzdump bandwidth. The latter seems to be a temporary fix, but of course, we will only see over the next week, if it acutally works.
Kind regards,
Hauke
Hi Manu,
thanks for your answer. No, we do not use a dedicated corosync network. But all backup/snapshotting traffic goes through a dedicated storage network, so there is no high load on the network, the corosync uses. I can see that in our Grafana-instance, we use to monitor the cluster...
Hi,
thanks for the hint. The CPU temperature does not seem to be the problem, it remains at a stable <45° Celsius all the time. The overheat protection would trigger at 89 degrees.
As for the airflow/temperature of the NIC, I will have to check with our hardware provider, the machines are in...
Hi Chris,
good point. We updated the BIOS a few weeks ago exactly because of this issue. We do not observe much CPU load (the machines are a little oversized regarding CPU). During the time of the last failure, the CPU load was <10%.
We did/do have, however, heavy traffic situations during...
Hi Chris,
I am confident that it is not primarily a hardware problem. All nodes are affected, randomly. Of course, it can still be hardware-related, because all nodes have the same hardware (except HDDs/SDDs on node 4).
Kind regards,
Hauke
Hi forum,
we have a 4 node setup based on Supermicro Superservers with the latest PVE5.4 and Ceph. For about 2 months now, we observe sudden reboots of single nodes about once per week without any hints in the logs (messages, syslog, kern.log). It seems like the node is running without any...
Oh dear, oh dear, oh dear.
I refuse to accept that I have been this stupid :eek:
I really thought, the host FW does not need to be activated. Shame on me, problem solved. It was a layer-8 problem.
Kind regards,
Hauke
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.