@alexskysilk is correct. I have not intentionally ran the cluster a node down for an extended period, but during maintenance periods, ceph operates without issue when a node is offline. You do need to learn the proper methods for shutting down ceph with the applicable flags set to disable...
I manage three-node cluster with CEPH (10GbE) in a production environment - and it works very well for us. But our Ceph is not used for general data storage - only VM disks for high availability. As other have stated, it's all about use-case. Our deployment is not designed for user data storage...
Unfortunately, I am unable to replicate. I can't even trigger it to work if I flip freeze/thaw back on. I'm still hunting for the exact cause - though it does appear to affect my Windows guests more than Linux.
Will test tomorrow and report back
I have a support ticket open and linked this thread, specifically your test results, so I am hopeful for additional info soon and for the devs to attempt to replicate
Stop works fine for our production cluster. If a shutdown/reboot is hung, cancel that task and hit Stop. Only very rarely has this failed to work on a problematic guest and I had to do any CLI intervention.
I'll definitely investigate this further. But it does beg the question - if it's something specific inside the VM, why is it only showing up now after the update?
Thanks for the additional insight.
No cPanel in use in this case, so unfortunately no smoking gun.
Only changes in our case are some additional sticks of server ram, ceph upgrade and Prox upgrade. Guests unchanged.
No? Shame. Well just in case I deactivated the PBS jobs and left only ZFS running. I tried to replicate with freeze on/off today with one of the VMs that seemed most susceptible to the issue, but of course no luck. Maybe it's an io issue. I recall something about io_uring being related.
I'm in the same boat. Haven't seen this issue for a long time on our cluster.
I thought that it might have been caused by my PBS not being updated at the same time - about 2 days after the v7 to v8 upgrade - but it is still happening. We run two sets of backup jobs every night - one PBS and...
Thanks all for the fantastic work here to diagnose and troubleshoot.
I am currently in the process of upgrading our production cluster to v8 and am already missing my Ceph dashboard.
As a temporary fix - is there a working method to port Ceph data to the native Prox influx server? I've tried...
Hi. Were you able to find a solution? I lost my Ceph dashboard due to the Python issue with current release and need another method to monitor via my influx/Graf Prox dashboard.
What are some strategies that people use to backup/clone/automate/etc OS disks and configurations?
I've read a number of threads on this topic - varying from clonezilla backups, automated config managers, zfs send cron jobs and many more. Many are outdated and I am curious if there are more...
This is the most wholesome thread I've ever read. It's like an Ireland morning roster!
+1 Wicklow
Thanks to support for the speedy reply to my ticket. Nothing like coming into work to find problems on a Monday morning. Though I prefer a simple cup of coffee to wake me up.
Hi - I am hoping to find some more clarification about Ceph total data usage as it is represented in Prox UI.
Info on the cluster & storage:
3 servers, each w/ 2x 1TB SSD + 10x 2TB HDD
Ceph Pools: 2
#1: ceph_ssd (triple replication, host failure domain)
#2: ceph_hdd (triple replication, host...
Hi @fabian - is this the correct line to retrieve and apply this patch?
curl https://git.proxmox.com/git/pve-storage.git | git apply
Also will applying this require any service restarts?
No worries -
I've successfully migrated (from ESXI) Win10, WinServ and Ubuntu.
The only issue I still have concerns CentOS. I can mount via SATA/IDE but the modification to dracut to include scsi drivers doesn't always work. Luckily we don't have a ton of Cent VMs.
modprobe virtio_scsi...
I have, indeed. That's been very informative. Just wondering about any additional input specific to Prox users might be able to offer in terms of this metric's usefulness.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.