Yep, same as us .. not an Intel vs AMD thing for sure .. definitely is pegged to NVMe/ZFS
We have ZFS mirrors for OS on all Ceph nodes and then Ceph OSDs are direct access to NVMe drives via LVM volumes as Ceph always does
Unfortunately, we get no kernel panic nor anything else in logs that...
So far, everything I've read in this thread makes sense and explains our crashes perfectly
Our only nodes that crash are the NVMe ONLY nodes that do our hyper-converged Ceph and setting iommu to off doesn't work
Just as other comments have explained, there have been no dmesg nor journal...
In this thread it was asked if folks could reply to whether or not "intel_iommu=off" worked to fix the crashing
For us it does not. Booting to kernel 6.8.4 still crashes the machine hard
This is on kernel 6.8.4 .. our fix is to pin our boot to 6.5.13-5-pve at this time
Our machines are Dell...
@sking1984 Usually, with things like this that hook into the kernel, etc .. you can purge the app, make sure all config files for it are gone, reboot, and then try re-installing completely fresh to go back to the default behavior intended by the manufacturer (Dell). Be sure of what you are doing...
OMSA is for guys that have never messed around on a Linux command line .. not everyone in my company is familiar with Linux and would be open to using ipmitool or checkmk .. OMSA gives a web-based GUI they are familiar with since 20 years ago when Windows was installed on the iron and OMSA was...
@thenoodle Excellent, thanks .. that's exactly why I had re-written this in the first place because many of the the packages were out-of-date. I'll be sure to update my list with this new one you mention ..
We're very glad to hear this was helpful to you. I always feel like I have to take pieces of what many say to get a complete answer so I just wanted to post the above so that someone else could get possibly a more complete solution .. though I'm sure it wouldn't apply 100% across the board...
This was successful for OMSA 10.3 for Proxmox 8.0 (Bookworm) - Dell R640 server - Proxmox 8 brandnew install
echo 'deb http://linux.dell.com/repo/community/openmanage/10300/focal focal main' | tee -a /etc/apt/sources.list.d/linux.dell.com.sources.list
wget...
It would be great if someone with a greater knowledge of the internals and how things have changed from Proxmox 6 to 7 could look into this deeper and come up with a real answer. Our VMs are MUCH slower since Proxmox 7 and even more so since 7.2 .. Proxmox 6 was like a race car now we feel like...
Yeah, in our particular case, the only 5.15 version kernel that wasn't AS problematic was 5.15.35-2-pve
We are currently running 5.13.19-6-pve as that has proven to be the fastest so far .. we haven't tested the 5.15.53-1-pve kernel yet .. For our customers we MUST have the stability so I'm not...
@midsize_erp Yeah, for sure .. as far as this Ceph problem goes with the long ping times, it's gone. We are still seeing some odd VM behavior on these latest kernels (5.15.x) but Ceph has been fine.
Ok, so, I've updated all the nodes in our cluster to the latest 5.15.35-2-pve .. things seem to be better so far .. It's only been a little over 3 hours but so far no anomalies. No Ceph slow ping errors and so far, very good VM performance and responsiveness. Time will tell if this is truly as...
Ok, so, I restarted pveproxy on the node that I'm accessing Proxmox GUI through and that has done it .. a VM is now migrating to that node
Can someone please iterate what things can go wrong during the cluster join process that could cause this sort of thing? What things need to be in order...
I added this new node's id_rsa.pub contents to /etc/pve/priv/authorized_keys
This has made it so that I can sign in via ssh to all the other nodes without a password but I'm still getting the error of "Host key verification failed" when trying to migrate
No, there's no issue with the...
Is the best way to fix this to simply use "ssh-copy-id" and copy this new hosts ID to each of the other nodes in the cluster because somehow it didn't happen during the join? or is there a much better way of making it happen?
I have this same error but .. this is different than what has been discussed here
This is Proxmox 7.2 and I just added this node to the cluster. All other cluster nodes can access this new node without a password but this new node can't access any of the other nodes in the cluster without a...
@rofo .. I believed I already tried all this but I will go ahead and take a look again and see if I missed anything the last time I worked on this .. from previous comments you can see that I haven't worked on this since early April .. Anyway, if I have any new breakthroughs, I'll post back here...
I really hope you are right .. hopefully your issues stay gone as a good sign that the latest kernel did indeed take of at least some of the issues .. I am still running my Ceph nodes on 5.13.19-6-pve because I simply can't have those problems .. I'm running my VM execution nodes on...
It doesn't work for us .. so I have no idea how to help you @rofo
In our humble opinion .. the VDI solution in Proxmox using Spice is "half baked" and not ready for business consumption. Nobody seems able to give clear and concise answers as to how to make it work and frankly, we've spent way...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.