Hi @fabian,
I have finally managed to get around to running an on-site PBS that syncs off to remote. Unfortunately, we're still getting high IOWait and unresponsive services as a result, meaning I'm unable to complete the backup. I don't suppose you can offer any more insight?
In terms of IO...
When I initially installed the drives I was disappointed to see we were still having issues as we started to apply load. It was mainly the IOWait of guests rather than apply/commit figures.
Turning the write cache off and setting to write through definitely helped us. Although this was done...
I thought it'd be good to follow up on this off the back of @Deepen Dhulla's like...
Long story short, EVO's just don't cut it when it comes to Ceph. In the end we replaced all 12 drives with Samsung PM893 3.84TB SATA SSD's. These have power loss protection which is what Ceph depends on for...
Thank Stoiko,
Very useful indeed! Appreciate your help and I'll mark this as solved as it seems to be nothing to worry about. I just wanted to check it wasn't the result of a misconfiguration.
Chris.
Hi,
I've installed and configured PMG. We're relaying email both inbound and outbound via a smarthost. I noticed on the headers that I get the following -
Received: from mxserver.example.net (localhost.localdomain [127.0.0.1])
by mxserver.example.net (Proxmox) with ESMTP id 69FD7201062...
Thanks for your response. We're backing up to a spinning rust off-site storage so it won't be anything fast. Is there any way I can work around this, i.e. backup locally and then remotely? We need remote backups as part of our backup strategy.
Thanks,
Chris.
I have a 3.6TB guest which was backing up fine (runs on Ceph), upgraded our nodes on Saturday and now I'm having to cancel the backup because it's causing the guest to become unrepsonsive during the process. I can see it looks like it's having to rebuild from scratch, but thought this wouldn't...
Interesting you should raise this. I upgraded yesterday and was just coming to the forum to ask about backup issues.
I have a 3.6TB guest which was backing up fine (runs on Ceph), upgraded our nodes yesterday and now I'm having to cancel the backup because it's causing the guest to become...
When I created my Proxmox cluster, I installed Ceph and did some testing.
We had a few issues with Ceph so I followed some guides on uninstalling it from Proxmox. Now we've recreated our cluster and it's working nicely. However, when I reboot one particular node (pve01), it takes 15-20 minutes...
Hi Aaron, I've had a breakthrough on this today.
I have 3 nodes that were originally spec'd by our provider with Samsung EVO 870's (useless for Ceph) so we're in the process of swapping them out with Samsung PM893's.
We added one PM893 to each node just to test and this is where the benchmarks...
Hi,
We've installed a 3 node Proxmox cluster running Ceph. On the cluster we have two VM's. When we reboot all PVE nodes, the IOWait of the VM's drops to pretty much nothing.
However over time, the IOWait creeps up - there is no load on these VM's. Any idea why this might be? As you can see...
@spirit those are pretty good stats.
We've just replaced our cluster (3 nodes, 1 DC) with 3 x 3.84TB Samsung P893's (with another 9 on order, so 4 per node).
We have Intel Xeon E5-2697 configured in performance mode so 2.3GHz / 3.6GHz turbo per core.
Running the same FIO benchmark within a VM...
@hepo have you got any further with this?
Have you tried running a rados benchmark against a newly created pool?
We're also finding writes are heavily restricted when running on a VM itself.
Just a secondary thought, maybe try setting up a crush map that only includes the nodes in the one DC, create a pool with that map and see how it performs...
I would have expected them to run like lightning, all enterprise with PLP.
Have you tried without spreading across the DC's? I believe the writes are synchronous and won't be acknowledged until writes to the replica OSDs have been acknowledged too, even if it's across to the secondary DC...
We've started using Ceph ourselves and I have to say how good it is. The particular benefit is in the case of failover it'll pick up right where it left off, the cluster is self healing and VM migrations are wickedly fast as there's no disk to migrate. In our test environment I tried my best to...
From my testing of Proxmox, one frustration I had was that unlike my previous Xen environment, Proxmox does not detect if a VM has panicked/crashed/frozen and as such won't reboot the VM, potentially ending up in hours of downtime until the issue is realised and resolved.
After a bit of digging...
The CephFS I was testing is mounted on the node itself.
I've just given the writeback mode a go and it didn't make much difference. At least now I have some benchmarks to work to so when we get the new disks in I've got a good baseline.
Thanks,
Chris.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.