I have to correct myself - I did not have the latest packages, but what was on my apt-proxy.
Apologies for that.
I have now updated to the latest as of this morning and now experience the same issues.
However: for me this is NOT restricted to FreeBSD 14 machines, but 13.2 as well.
I run FreeBSD 14.0 based VMs on the latest PVE versions without problems whatsoever.
What storage backend do you use?
My VMs are using VirtIO SCSI Single with either zfs or ceph as backend.
That all depends on who your ceph clients are. If you use ceph exclusively as a storage backend for VMs, then your cluster nodes will be the only clients and you can then determine if the network that traffic runs on actually needs the security.
First recommendation: do benchmarks, identify whether you lack IOPS or bandwidth, whether some storage medium is the problem and if so, which one and then you have a much better chance adressing the specific reasons.
Also keep in mind that the limiting factor in ceph may not be the storage...
correct. You can do that on the fly to change the current scheduler.
I actually doubt there will be any performance benefit with regards to I/O. However, you might save some CPU cycles because none does less work than mq-deadline.
It all obviously heavily on your specific hardware, but I do not...
The schedulers you mentioned are what older kernels offered and no longer available. More modern kernels offer multiqueue alternatives to these schedulers.
If you have SSDs and you see none as the scheduler or have HDDs and their scheduler is mq-deadline, you already have the desired...
We have seen the same sort of thing happen. Upon reboot one (or more) OSDs may (or may not) crash in this manner. So far, it seems random.
There is a ceph issue for just such a crash here:
https://tracker.ceph.com/issues/56292
But there does not seem much (if any) activity as of now.
ceph crash...
PS:
There is no timeout issue when using proxmox-backup-client on one of the pve nodes to list all snapshots. It takes much longer than 5 seconds, though.
But as soon as I try to use pvesm to list backups, I run into the timeout (after roughly 5 seconds).
I ended up getting the right name of...
Fabian, is it possible that some settings changed? E.g. the length of the timeout? I ask because I have the same issue after upgrading the pve nodes to the latest packages a few days ago. It used to be more like 20 or so seconds before I got the timeout when trying to view the backups for a VM...
I have seen similar situations on my mailservers (exim + dovecot). Load gets high, queues are stuck. People can no longer fetch mail (not only do no new mail come in, but connections to dovecot just time out).
But the same issue occurs with other VMs (my build box for poudriere for instance).
I...
Thank you. I do have Ceph and PVE using the same physical 10G network link (although separated by VLAN). However, the bandwidth was not even above 25% around the time the crash occurred.
I had seen the packages this morning and followed your instructions from yesterday. I could upgrade the remaining nodes without trouble.
One question remains for me: Can you clarify what 'particular load/network' means? Bandwidth? Using the same (physical) network for PVE cluster traffic as well...
good to know.
I have done that sort of thing lots of times and never had this crash, so it might just have been 'luck'. But as long as there is a fix... :-)
Thanks!!
Indeed, they do. Lots of them. Thanks!
I guess I will wait for the corosync and libknet package updates before doing anything else?
Can these be upgraded without risking more downtime? E.g. any order / steps to watch out for?
Hi there, I just stumbled upon this thread having experienced a very similar (if not identical) situation:
I had updated one node (that is running pve-backup and has no VMs) already and had initiated its reboot.
I also started moving VMs from one node to another, in order to perform the update...
Cool. It would also fit with what happened to me - I was still on 16.2.4, when the cluster crashed. Good to know it is fixed.
Still, I will stay on Octopus for the time being. :)
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.