Hi Dominik,
so this is the output directly on the pbs01
This is from a local PVE cluster connected via 10G. On I have not yet experienced problems like mentioned above on it.
And this is the output of a node of the 5-node Cluster. As mentioned the traffic goes via "The Internet". This...
Hi,
I have a older installation of a 3-node Proxmox Ceph cluster - probably 5.3 or older. The OS disks look like this:
So there is no 512M partition I could use for the EFI boot partition.
I would like to switch to a UEFI boot, but how do I cut out that 512M UEFI partition?
Hi,
we are trying to replace out current Backup solution with PBS. So I installed a PBS on an older ThomasKrenn 4HE Intel Dual-CPU RI2436 with some disks...
One Proxmox Cluster is already running backups into this pbs01 for some time without problems. Now I try to run backups with our...
Might want to have a look here:
https://forum.proxmox.com/threads/ha-nfs-service-for-kvm-vms-on-a-proxmox-cluster-with-ceph.80967/post-365321
But I can not recommend to run a nfs-kernel-server within a LXC container on Proxmox - as soon as the nfs-kernel-server hangs it might also hang your...
I can not recommend using nfs-kernel-server with a cephfs kernel client when using CephFS snaphots.
As soon as the MDS service fails over from active to a standby MDS the NFS clients die with kernel panic or services running on them just die. It seems to be a caching problem. This problem...
nfs-ganesha runs in user space, so if it hangs or crashes in the CT (LXC) container it will not take the Proxmox Host down with it.
If the nfs-kernel-server in a CT container has problems it might hang the Proxmox host.
It is better to run a nfs-kernel-server within a Proxmox VM (KVM).
Deep scrubbing is not happening at all at the moment - and probably not untill this is finished...
()
9 pgs not deep-scrubbed in time
pg 4.76f not deep-scrubbed since 2021-01-31 13:12:33.368818
pg 4.72f not deep-scrubbed since 2021-01-31 16:48:10.470599
pg 4.649 not deep-scrubbed since...
Yes, i can check them, but what should I be looking for???
root@proxmox07:~# ceph pg ls | head -n1
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP...
No, the pg_num and pgp_num are now at 128 and I changed the autoscaler to "warn" for these pools.
root@proxmox07:~# ceph osd pool ls detail
pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 15199 lfor...
So,
the monday was horrible, our customers started to hate us again...
Carefully(!!!) restarting the OSD processes and using the recoveries to keep the backfilling from starting up got us over the day, and late in the evening we found the bluefs_buffered_io setting. So that shifted the read IO...
Hi,
the autoscaler increased the number of PGs on our Ceph storage (Hardware like this but 5 nodes).
As soon as the backfill starts the VMs become unusable and we startet killing OSD processes that cause high read io load. So as in this picture we would kill the ceph-osd process working on...
So, here comes the next uncertainity from my side:
I generally disable atime on ZFS pools and this becomes inherited on the volumes then.
If you rely on atime does this still work?
So I let the log output continue to run and after about 2 hours and about one GByte I stopped it again.
The content is just about the same for the whole file.
Here are parts of the log:
starting garbage collection on store pve-prod
task triggered by schedule '15:40'
Start GC phase1 (mark used chunks)
marked 1% (17 of 1651 index files)
WARN: warning: unable to access chunk 2cd5a53b5d8aa3c9d530c2ee2b89ccf7ed238ad7bf97afb1a3424666784656d0, required by...
Still venting air here...
Underlying storage is a ZFS pool, which looks ok:
There are so many warnings that the Browser stops working, when I tried to have a look at them:
Mount looks ok:
root@pbs01:~# mount | grep pbs
rpool/ROOT/pbs-1 on / type zfs (rw,relatime,xattr,noacl)
backup/pbs on...
No idea how that happened! But 206GByte ist definately not the size of my source five node PVE cluster.
Any idea how that could happen?
So on 27.01.2021 it went full and two days later it is almost empty...
I am pretty shocked right now...
Rainer
But does not support properly exporting .snap directories...
At least a bug report was created for this missing feature on NFS-Ganesha : https://tracker.ceph.com/issues/48991
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.