All of my cluster is on the following.
root@cephmon:~# ceph -v
ceph version 16.2.9 (a569859f5e07da0c4c39da81d5fb5675cd95da49) pacific (stable)
This entire cluster started on Nautilus so I would think I should be aok.
We have a ceph cluster with roughly 600 RBD's. 2 of the 600 randomly have a new feature which is breaking our backup's.
root@cephmon:~# rbd info Cloud-Ceph1/vm-134-disk-0
rbd image 'vm-134-disk-0':
size 1000 GiB in 256000 objects
order 22 (4 MiB objects)...
I have a small 2 node cluster for some random VM's that had a storage issue last night. Was hoping to utilize my VM backups, but found some VM's are being skipped during the backup job with the following.
What would make these VM's external? This was all working about 2 months ago, then it...
I just found out that its actually happening in all situations with live migration.
I agree, start/stop is great, but we have a cluster with almost 600 VM's and depend on live migration heavily for uptime.
We have a mix of Intel 2nd/3rd Gen Xeon's. I can reproduce the issue going...
Wanted to report back. Did some more testing.
Here is what I pinned down.
- Slowness only happens on VM's which are migrated
- VM is aok if its freshly started on a host running the newer 6.2.x kernel
- I was in the process of upgrading packages and moving over to 6.2.x when I hit this bug...
Looks like the latest 6.2 kernel still has some major performance issues in our enviroment just like 6.1.
We are seeing a load increase of 30-50% on pretty much all VM's running on hosts with the 6.2.x kernel
If we go back to 5.15.x all is well.
Check out the screen shot, our load has...
It just so happens that this server also has Intel E810 based NIC's (Both 25G and 100G). They are working great with the ice drivers. Moving data as we speak over the E810 NIC's on the 5.15.74-1-pve kernel. ..
root@gppctestprox:~# lspci | grep Ethernet
11:00.0 Ethernet controller: Intel...
Its a Gen 10 with 4x Intel Gold 6254's.
Bios is a little older but not to bad (2022), might be one new revision.
After lots of testing I have narrowed it down to the below NIC.
13:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)...
Thats it. It was a brand new install as well. It would sit for 5-10 minutes, then it would print those hung task messages and that was it. I let it sit for 10+ minutes, but there was nothing else after that.
I reinstalled a 2nd time and same issue. If there is anything else I can provide...
The 5.15.102-1-pve kernel fails to boot on HP DL 560's. I haven't had a chance to test other hardware, but it definitely has issues on the 560. Hoping this kernel doesn't get released to the enterprise repo's as we have ALOT of 560's in production.
Attached a screen shot of what happens...
Anyone else using any of these E810 based NICs (25G & 100G)?
They are using the ice driver.
They show up in proxmox and look ok, but I can't for the life of me get them to light up.
Nothing in dmesg or any of the logs about transceiver mis matches.
Figured id see if anyone else is...
Any one else notice some major performance changes going from 5.15.x -> 6.1.x?
Upgraded one of my heavy hitter front ends (Quad Socket DL 560 Gen10) and now we are seeing a major load increases. CPU load has almost doubled.
Going back to 5.15.x has corrected the issue, but with live...
Just wanted to mention.
While getting some front ends moved to 6.1.x, I hit VM lockups on every VM going from a DL 560 Gen10 with 5.15.83 to a DL 560 Gen10 with 6.1.2-1. Both servers are identical. Once both were on 6.1.2-1 all was well.
Appears to be more than just a generation issue...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.