Sheepdog 1.0

seventh

New Member
Jan 28, 2016
22
2
3
35
Good day,

They just released version 1.0 of sheepdog and I just wonder if you are going to update pve-sheepdog package anytime soon?

From what I can read v1. 0 should be a stable release.

I have tried v0.93 and I think sheepdog is really easy to setup, has auto recovery and working features as snapshots and vm disk resize.
I haven't tried the performance yet as I tested inside VirtualBox.

Thanks!
 

seventh

New Member
Jan 28, 2016
22
2
3
35
Ok nice!

So it's still in development and not released on the repository?

Or is there a package I can download and test out?

Thanks for the swift reply!
 

seventh

New Member
Jan 28, 2016
22
2
3
35
I understand!
So will this release be considered as a stable one from your side?
 

seventh

New Member
Jan 28, 2016
22
2
3
35
I followed the development pipermail thread and I think that sheepdog 1.0 now should be complete? It's a little bit hard to follow it though :)

Do you have any schedule when you will release the new version and will it be considered as a stable release?

Thank you!
 

athompso

Member
Sep 13, 2013
127
7
18
Dietmar, what sort of delay or timeline is typical between:
1. a commit like this going into the repo,
2. a package showing up in pvetest,
3. a package showing up in no-subscription
4. a package showing up in pve-enterprise, and finally
5. the package being included in the latest ISO?

I'm running some systems with subscriptions, some without, and I have absolutely no idea when I can look at switching from NFS / local storage back to Sheepdog.
(With 9 nodes, 1Gbps and only HDDs, CEPH just isn't usable for me and I had to pull it out completely and switch to QCOW2 on NFS instead. CEPH is OK on the 3-node 10Gbps all-SSD cluster, though.)

Thank you,
-Adam
 

dietmar

Proxmox Staff Member
Staff member
Apr 28, 2005
16,503
320
83
Austria
www.proxmox.com

athompso

Member
Sep 13, 2013
127
7
18
Perhaps I've misunderstood... do I not also need the "sheepdog" package, which is still at 0.8.3-2 (in no-subscription)?
(Never mind - testing shows that I do not. That is what was confusing me.)

However, going back to my previous question, is there any approximate guideline I can use for planning at, say, the quarterly (3-month) level? I don't have any sense of what the development team's pace is for promoting new code to public availability at various points.

Thank you for the quick response,
-Adam
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,465
540
113
However, going back to my previous question, is there any approximate guideline I can use for planning at, say, the quarterly (3-month) level? I don't have any sense of what the development team's pace is for promoting new code to public availability at various points.
depending on the type of changes and time needed for testing, the time from commit to our git repositories to package release to pve-no-subscription ranges from a couple of days to a couple of weeks. the same applies for the transition from pve-no-subscription to pve-enterprise.

sometimes blocking issues cause longer delays, but this is rather rare.
 

athompso

Member
Sep 13, 2013
127
7
18
depending on the type of changes and time needed for testing, the time from commit to our git repositories to package release to pve-no-subscription ranges from a couple of days to a couple of weeks. the same applies for the transition from pve-no-subscription to pve-enterprise.

sometimes blocking issues cause longer delays, but this is rather rare.

Thank you! That makes it easier to plan and to provide meaningful status updates to stakeholders. (It doesn't help for the bugzilla entries that stay open for many months because they're hard to fix, but at least it gives me some guidance.)

Sadly, we're running into a whole bunch of GUI problems right now, which we'll be documenting and filing bugs for. But that's another thread entirely...
 

athompso

Member
Sep 13, 2013
127
7
18
Oh, and I can confirm that pve-sheepdog works well so far on one cluster. There's an anomaly where restarting any given node causes a disproportionate amount of recovery to occur even if cluster-rebuild is disabled while the node gets restarted, but that may be me not understanding the protocol well.
And performance on a 3-node gigabit-only cluster with 3x replication is noticeably better than CEPH, especially since it's safe for me to use "--no-sync" in that particular environment!
 
  • Like
Reactions: blackpaw

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,465
540
113
Thank you! That makes it easier to plan and to provide meaningful status updates to stakeholders. (It doesn't help for the bugzilla entries that stay open for many months because they're hard to fix, but at least it gives me some guidance.)
we also use the bug tracker to track / not forget about long-term planned or requested features, so that is sometimes to be expected. we try to update the entries to reflect progress though, and have discussions either there or on pve-devel if required. IMHO it's better to have an open bug report for a while then no bug report at all, as long as it's an actionable report.

if you have the feeling that something "fell through the cracks", feel free to ping (e.g., the last reponse was something like "this should be easy to fix, I'll take a look at it" and was 2 months ago ;)). unfortunately some bug reports are very hard to reproduce (e.g., there is not enough information in the original bug report, and no response to requests for more information/logs/..., or it only triggers in rare corner cases that are not yet narrowed down) - those tend to stay open for quite a while with no progress.

Sadly, we're running into a whole bunch of GUI problems right now, which we'll be documenting and filing bugs for. But that's another thread entirely...
please do! that's what the bug tracker is for :)
 

blackpaw

Member
Nov 1, 2013
230
2
18
And performance on a 3-node gigabit-only cluster with 3x replication is noticeably better than CEPH, especially since it's safe for me to use "--no-sync" in that particular environment!
Thats my environment, except for the --nosync :)

Any idea of the implications of --nosync? does it mean some VM's could be missing a few writes (after a server crash) or could the actual sheepdog cluster be toast?

My main reservations re sheepdog are:
- documentation
- the user mailing list is absolutely silent
- the dev mailing list has close to no activity as well
- lack of information on cluster status

But it strikes such a nice balance between ceph, gluster and lizardfs :)
 

mir

Well-Known Member
Apr 14, 2012
3,489
97
48
Copenhagen, Denmark
Thats my environment, except for the --nosync :)

Any idea of the implications of --nosync? does it mean some VM's could be missing a few writes (after a server crash) or could the actual sheepdog cluster be toast?
Another option is '-n, --nosync' for sheep daemon, which drops O_SYNC for write of backend. It literally means we don't set 'sync' flag for write of backend. This will dramatically improve write performance if you don't have object cache enabled, at the cost of possibility to lost some data in the case of power failure of the whole cluster. In other words, if only some nodes in the cluster crash, there is no damage of data at all even if you have '--nosync' enabled (assume the number for failed nodes is protected by the redundancy level). If any one of following conditions goes with your cluster,

  • your data center promises there is no power outage
  • all the disks are battery-backed
  • you don't care about the very low possibility of power outage and want the best performance
you can enable '--nosync' option for sheep daemon to enjoy write boost.
https://github.com/sheepdog/sheepdog/wiki/Why-The-Performance-Of-My-Cluster-Is-Bad
 

blackpaw

Member
Nov 1, 2013
230
2
18
Thanks mir, I interpret that to mean to mean individual VM's could lose data, but the overall cluster will remain intact (I managed to destroy a lizardfs cluster in testing - those master servers are fragile)

One thing I only just thought to check is memory usage - with just two 32GB VM's the sheep process is consuming 6GB of RAM. Thats is rather alarming.
 

mir

Well-Known Member
Apr 14, 2012
3,489
97
48
Copenhagen, Denmark
I interpret that to mean to mean individual VM's could lose data, but the overall cluster will remain intact (I managed to destroy a lizardfs cluster in testing - those master servers are fragile)
IMHO nosync in sheepdog is comparable to using writeback cache for qemu disks. Or in DB slang using async writes - you can loose data from the most recent transactions (last writes to disk can be lost) but the database is guarantied to be consistent (sheepdog cluster is always consistent).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!