Not ready to shutdown whole Ceph cluster but realised I have debugging on which is default with proxmox install and ceph.
So would like to turn off debugging and would really not want to reboot on live cluster.
Is it easier just do the following and quote from another google response...
We have Microns 5210 drives in ceph.
I read this today:
It states we must disable write cache?
Should I do this on all our drives.
Can we do it with a live Ceph...
I have never seen this before. usually I get disk failed completely but this is new. Please advise if this disk has failed or not. I have another 11 of these disks and they dont give these results only this particular one.
We typically only have KVM VMs in proxmox and currently use Krbd. I was informed by my colleague librbd is better for qemu kvm worloads. We mainly have vms hosting websites and sql.
He stated there is major improvements on librbd recently that make it better? something about it being rewritten...
so I created a new lxc container and set cores to 2 and cpu limit to 2.
The server itself has 64GB memory and 24 cores (12 core processors x 2 sockets)
However when this server is heavily tested and load goes up in top we see this on the node:
top - 08:34:49 up 10:55, 3 users, load average...
We have cpanel centos 7 servers on proxmox 6 using LXC
Are there any known issues we should be aware of as we need to upgrade around 80 LXC containers as systemd is outdated on these and they using centos 7.
Anyone have experience or aware of any known issues we should be aware of.
Can we set the time for example during business hours like say from 7am to 5pm for garbage collection and pruning to start rather than it running during the night at the same time as backups run?
It seems to slow the backup server somewhat.
UPDATE: NEvermind. Found it.
Trying to run the following:
ceph daemon osd.6 perf
Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n"
Not sure what is wrong.
ceph.conf is as per below...
Is it possible to have ceph compression work on existing pools? I think since I only enable it now compression is only working with new data. How to compress existing data. I am using aggressive mode with lz4
We wanted to move to 2/2 for a bit while we wait for our new SSDs to arrive as we have limited storage space now in one cluster. However when doing so and moving from 3/2 to 2/2 we notice that all our VMs pause or become "read only" when Ceph is rebalancing if a disk is taken out and a...
I think I may have found something.
I had an issue with diskspace and hence changed from replication x3 to x2 knowing the possible risks.
However it was to be temporary while I add more OSDs.
But when I add more OSDs to new servers now I am noticing very high IO wait and servers "freeze".
I am reading on some posts that diskspace of disks are measure based on the smallest OSD disk in cluster?
So for example if we have the below on each node on 6 nodes.
2 x 2TB
1 x 500GB
Are we saying diskspace is lost? due to the 500GB.
Should we rather just remove the 500GB. We just had...
I have a question I forgot to test when we did our testing phase.
If we stop and out a disk but then realised we did the wrong disk. Can we just bring it back in a gain without "destroying the data" first on it?
Any risk in doing so. Will ceph just use the data on the OSD and just...
We have a stable good working ceph cluster with one ceph pool where all data is on. We have a few VMs running currently on that pool.
I noticed there was an option called KBRD and noticed that on some posts on forums it states that performance can be increased in enabled KBRD on the...
For now we have been creating a monitor on every server we setup. Not sure why we just have :)
Now lets say we have a 11 node cluster with a monitor on each - do you think that's overkill.
Also on a 4 node ceph cluster which we also have setup do you think 2 monitors would suffice as I...
If we have a currently running Ceph Cluster with the following:
7 nodes with the following setup:
Dell R610 Servers
64 GB Memory
1 x 480GB PM863a SSD for Proxmox OS
5 x 600GB Enterprise 10K SAS Disks for OSDs
10Gb Ethernet Network
Dell H200 Card
Lets say these nodes are doing ok running only...
I notice that our 2 nodes in our cluster are greyed out now after update and restart of server.
I see this in logs:
ug 4 18:10:57 pve-2 systemd: pvestatd.service: Found left-over process 21897 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd: This...
When I do an resize on lvm partitions or anything runs against lvm.
If I dont then its fine but as soon as I try to make a change to lvm partition it dies. Tried it on few servers same issue on pve-kernel-5.4.44-2-pve.
I see these stuck:
root 5938 0.0 0.0 15848 8724 ? D 08:40 0:00 /sbin/vgs...