I hear you, but I've already got the default profile that prioritizes client ops, yet during a recovery multiple slow OSD ops are all that it takes to freeze VMs.
It seems like regardless of my choice, I have to set the nobackfill and norecover flags and so on and wait until the evening to...
Thanks Aaron. I am considering going back to wpq until the next release. I don't want to get into an open-ended situation of adjusting tunables all the time.
Tonight, rebooting a single-OSD host with the OSD flags set went better. There was a brief appearance of slow OSD ops but they were completed within a few seconds.
I fear how Quincy would respond to an unplanned failure. One would need to react quickly to disable recover, balance, and backfill...
This sounds closest to what we are experiencing. What else do you know about it, and what else can be done besides using the norebalance flag?
To me, the 17.2.4 changelog suggests that these things have been fixed, when they have not.
With regards to the ceph status, don't worry about the 1 mon being down. They are on comparatively slower storage and spend a lot of time on get_health_metrics. That is one reason we have 7 mons, they are all active and running, but they come and go when they get bogged down with stats but we...
We have 15 OSD hosts, and 22 OSDs. The servers physically have 2 drive bays. Of course the OSDs are not distributed perfectly evenly. Some servers have 1 OSD and some servers have 2 OSDs, but we are always adding drives to the system as time/availability allows.
OSD utilization according to the...
just more testing. read the corosync redundancy part about setting ring priorities
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy
I had a couple Windows VMs get irreparably corrupt after a spontaneous reboot of all my nodes. Also using Ceph. We ended up disabling HA entirely because of it.
If this is merely a matter of taking some switches offline, and you are already familiar with editing corosync.conf, then you can...
The pvecm section hasn't changed since I first read it over 4 years ago. There are still notes in there relating to PVE 3.x and 4.x.
I was hoping for some real world accounts from admins of large clusters but it seems large clusters are rare.
Wow I can't believe I missed that, I went through all the datacenter tabs. A cluster prefix would be nice too, and cleaner looking, but this certainly gets the job done.
Due to Corosync there is clearly a finite and rather small number of nodes that a single Proxmox cluster can support, yet Ceph clusters have no such size limit.
The only thing preventing multiple Proxmox clusters going hog-wild on a shared Ceph cluster is the certainty of overlap and collision...
We have a 16 node cluster and are preparing to expand to 32 nodes.
Because of the hemmed in architecture of the Dell blades and the M1000e modular chassis, we had some interesting choices to make as far as the physical networking. Each server has 6x 10 GbE ports and 8x 1 GbE ports.
The 10 GbE...
The cluster name appears to be fixed on the cluster tab as well as the Datacenter root object in the sidebar tree.
Just curious if it's possible to rename this. I see clearly how the GUI doesn't provide for such a thing but I am open to hacky suggestions as well.
FYI this issue came up with me on a 2016 VM, Proxmox 7.2-7, and Qemu 6.2.0-11, and adding args: -machine smm=off to the VM config had no effect.
Will try the downgrade later on, this is my only environment and it's production so unfortunately I can't afford to play with it.
Thanks for responding. Of course it is not a true security measure, it was just meant to control the perspective of our users and provide no more than the data pertinent to them.
These people can only control the power on/off/reset, console, and snapshots, nothing to do with any particular...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.