Sheepdog 1.0

seventh · Jul 16, 2016

Good day,

They just released version 1.0 of sheepdog and I just wonder if you are going to update pve-sheepdog package anytime soon?

From what I can read v1. 0 should be a stable release.

I have tried v0.93 and I think sheepdog is really easy to setup, has auto recovery and working features as snapshots and vm disk resize.
I haven't tried the performance yet as I tested inside VirtualBox.

Thanks!

tom · Jul 16, 2016

already done, for all details please follow the pve-devel list.

http://pve.proxmox.com/pipermail/pve-devel/2016-July/021910.html

seventh · Jul 16, 2016

Ok nice!

So it's still in development and not released on the repository?

Or is there a package I can download and test out?

Thanks for the swift reply!

dietmar · Jul 16, 2016

seventh said:
So it's still in development and not released on the repository?

Exactly

seventh · Jul 16, 2016

I understand!
So will this release be considered as a stable one from your side?

seventh · Jul 27, 2016

I followed the development pipermail thread and I think that sheepdog 1.0 now should be complete? It's a little bit hard to follow it though

Do you have any schedule when you will release the new version and will it be considered as a stable release?

Thank you!

athompso · Oct 7, 2016

Dietmar, what sort of delay or timeline is typical between:
1. a commit like this going into the repo,
2. a package showing up in pvetest,
3. a package showing up in no-subscription
4. a package showing up in pve-enterprise, and finally
5. the package being included in the latest ISO?

I'm running some systems with subscriptions, some without, and I have absolutely no idea when I can look at switching from NFS / local storage back to Sheepdog.
(With 9 nodes, 1Gbps and only HDDs, CEPH just isn't usable for me and I had to pull it out completely and switch to QCOW2 on NFS instead. CEPH is OK on the 3-node 10Gbps all-SSD cluster, though.)

Thank you,
-Adam

dietmar · Oct 7, 2016

athompso said:
Dietmar, what sort of delay or timeline is typical between:

sheepdog 1.0 is already in the repository:

http://download1.proxmox.com/debian...n/binary-amd64/pve-sheepdog_1.0.0-1_amd64.deb

athompso · Oct 8, 2016

dietmar said:
sheepdog 1.0 is already in the repository:

http://download1.proxmox.com/debian...n/binary-amd64/pve-sheepdog_1.0.0-1_amd64.deb

Perhaps I've misunderstood... do I not also need the "sheepdog" package, which is still at 0.8.3-2 (in no-subscription)?
(Never mind - testing shows that I do not. That is what was confusing me.)

However, going back to my previous question, is there any approximate guideline I can use for planning at, say, the quarterly (3-month) level? I don't have any sense of what the development team's pace is for promoting new code to public availability at various points.

Thank you for the quick response,
-Adam

dietmar · Oct 8, 2016

athompso said:
Perhaps I've misunderstood... do I not also need the "sheepdog" package, which is still at 0.8.3-2 (in no-subscription)?

Sorry, but ther is no package named 'sheepdog' in pve-no-subscription...

athompso · Oct 8, 2016

dietmar said:
Sorry, but ther is no package named 'sheepdog' in pve-no-subscription...

It's obviously too early in the morning. That package is in the jessie repos, not the pve repos.

dietmar · Oct 8, 2016

athompso said:
It's obviously too early in the morning. That package is in the jessie repos, not the pve repos.

You do not need that package.

fabian · Oct 10, 2016

athompso said:
However, going back to my previous question, is there any approximate guideline I can use for planning at, say, the quarterly (3-month) level? I don't have any sense of what the development team's pace is for promoting new code to public availability at various points.

depending on the type of changes and time needed for testing, the time from commit to our git repositories to package release to pve-no-subscription ranges from a couple of days to a couple of weeks. the same applies for the transition from pve-no-subscription to pve-enterprise.

sometimes blocking issues cause longer delays, but this is rather rare.

athompso · Oct 10, 2016

fabian said:
depending on the type of changes and time needed for testing, the time from commit to our git repositories to package release to pve-no-subscription ranges from a couple of days to a couple of weeks. the same applies for the transition from pve-no-subscription to pve-enterprise.

sometimes blocking issues cause longer delays, but this is rather rare.

Thank you! That makes it easier to plan and to provide meaningful status updates to stakeholders. (It doesn't help for the bugzilla entries that stay open for many months because they're hard to fix, but at least it gives me some guidance.)

Sadly, we're running into a whole bunch of GUI problems right now, which we'll be documenting and filing bugs for. But that's another thread entirely...

athompso · Oct 10, 2016

Oh, and I can confirm that pve-sheepdog works well so far on one cluster. There's an anomaly where restarting any given node causes a disproportionate amount of recovery to occur even if cluster-rebuild is disabled while the node gets restarted, but that may be me not understanding the protocol well.
And performance on a 3-node gigabit-only cluster with 3x replication is noticeably better than CEPH, especially since it's safe for me to use "--no-sync" in that particular environment!

fabian · Oct 10, 2016

athompso said:
Thank you! That makes it easier to plan and to provide meaningful status updates to stakeholders. (It doesn't help for the bugzilla entries that stay open for many months because they're hard to fix, but at least it gives me some guidance.)

we also use the bug tracker to track / not forget about long-term planned or requested features, so that is sometimes to be expected. we try to update the entries to reflect progress though, and have discussions either there or on pve-devel if required. IMHO it's better to have an open bug report for a while then no bug report at all, as long as it's an actionable report.

if you have the feeling that something "fell through the cracks", feel free to ping (e.g., the last reponse was something like "this should be easy to fix, I'll take a look at it" and was 2 months ago

). unfortunately some bug reports are very hard to reproduce (e.g., there is not enough information in the original bug report, and no response to requests for more information/logs/..., or it only triggers in rare corner cases that are not yet narrowed down) - those tend to stay open for quite a while with no progress.

Sadly, we're running into a whole bunch of GUI problems right now, which we'll be documenting and filing bugs for. But that's another thread entirely...

please do! that's what the bug tracker is for

blackpaw · Jan 4, 2017

athompso said:
And performance on a 3-node gigabit-only cluster with 3x replication is noticeably better than CEPH, especially since it's safe for me to use "--no-sync" in that particular environment!

Thats my environment, except for the --nosync

Any idea of the implications of --nosync? does it mean some VM's could be missing a few writes (after a server crash) or could the actual sheepdog cluster be toast?

My main reservations re sheepdog are:
- documentation
- the user mailing list is absolutely silent
- the dev mailing list has close to no activity as well
- lack of information on cluster status

But it strikes such a nice balance between ceph, gluster and lizardfs

mir · Jan 4, 2017

blackpaw said:
Thats my environment, except for the --nosync

Any idea of the implications of --nosync? does it mean some VM's could be missing a few writes (after a server crash) or could the actual sheepdog cluster be toast?

Another option is '-n, --nosync' for sheep daemon, which drops O_SYNC for write of backend. It literally means we don't set 'sync' flag for write of backend. This will dramatically improve write performance if you don't have object cache enabled, at the cost of possibility to lost some data in the case of power failure of the whole cluster. In other words, if only some nodes in the cluster crash, there is no damage of data at all even if you have '--nosync' enabled (assume the number for failed nodes is protected by the redundancy level). If any one of following conditions goes with your cluster,

your data center promises there is no power outage

all the disks are battery-backed

you don't care about the very low possibility of power outage and want the best performance

you can enable '--nosync' option for sheep daemon to enjoy write boost.

https://github.com/sheepdog/sheepdog/wiki/Why-The-Performance-Of-My-Cluster-Is-Bad

blackpaw · Jan 4, 2017

Thanks mir, I interpret that to mean to mean individual VM's could lose data, but the overall cluster will remain intact (I managed to destroy a lizardfs cluster in testing - those master servers are fragile)

One thing I only just thought to check is memory usage - with just two 32GB VM's the sheep process is consuming 6GB of RAM. Thats is rather alarming.

mir · Jan 4, 2017

blackpaw said:
I interpret that to mean to mean individual VM's could lose data, but the overall cluster will remain intact (I managed to destroy a lizardfs cluster in testing - those master servers are fragile)

IMHO nosync in sheepdog is comparable to using writeback cache for qemu disks. Or in DB slang using async writes - you can loose data from the most recent transactions (last writes to disk can be lost) but the database is guarantied to be consistent (sheepdog cluster is always consistent).

Sheepdog 1.0

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Famous Member

Renowned Member

Famous Member

We value your privacy