Saturday, February 28, 2009

Commit Access: It's a Social Problem

Many proponents of distributed version control systems (DVCSs) say the biggest advantage is that anybody can create a branch and begin working on a project. Whereas, for a centralized system (such as Subversion), the would-be contributor needs to have commit access before they can contribute.

Let's walk through this.

Obviously, this contributor can grab a tarball, make their change, and send a patch file to the project's mailing list. No commit access is required to do that. Why a DVCS, then? Well... the DVCS simplifies the retrieval and application of the patch (by the project's developers, or third-party users of the project). The contributor also gets use a version control system while developing the patch, which I'll just axiomatically state as a Good Thing.

Okay. So if they don't have commit access, then a DVCS is very handy. What would the scenario be if they did have commit access? The contributor could develop their change on "trunk" or on a branch. We've already stated this is a would-be contributor -- not one of the regular developers who already has commit access. It really doesn't make sense for this person to modify trunk directly, so let's just say the work is being done on a branch.

So why would this potential contributor not have commit access? Really? All their work is happening on a branch. It isn't like they're going to mess up the project from over there. They're going to generate some commit emails, sure, but maybe the other developers could then provide pointers, assistance, and feedback earlier than if the contributor had arrived with a patch, as a fait d'accompli. This is source control, people. Anything changed can always be reversed. No permanent harm is possible.

So why do potential contributors not receive commit access to a branch, as soon as they ask for it? For social reasons. It certainly isn't technical. Projects have an us versus them attitude, and they don't get to commit to our repository.

For reference, I'll note that the Apache Software Foundation provides branches to Google Summer of Code students. These students arrive with no credentials, they get a branch, and then work on their code over the summer. When they are done, the work can be merged back to trunk, if it is acceptable. It has worked out very well for all involved.

In the Subversion project, we set up branches for developers to try out their ideas. We say these developers have "limited commit access" rather than "full commit". I'll also note that there are no technical limitations on their commit access. Those developers could commit to trunk if they tried. But social restrictions prevent them from doing so. We've never had a problem with rogue developers, since it is so easy to undo any mistakes or intentional harm, and to remove their access.

In this respect, DVCSs are simply a workaround to social barriers put into place by projects. They do not address the core problem: projects should be inclusive rather than exclusive.


Ben Collins-Sussman said...

If you're talking about a group of committers who already have access to a central repository -- then yes, I agree, DVCS adds little value beyond the occasional convenience of doing "offline" commits on a train or plane.

But as you mentioned in your post, IMO the really huge "value proposition" of DVCS is that it dramatically lowers the barrier for participation by non-committers . Emailing patches back and forth is so much more awkward than doing peer-to-peer "pulls" of changesets. By making the review process more pleasant and giving*everyone* the exact same version control tools -- committers or not -- it's dramatically easier to get outside contributions.

In theory, you'd think that lower this barrier would make projects "accelerate" in their ability to pick up core members... but oddly there seems to be a counter-force in the git community where the convention is that absolutely every participant keeps a permanent "private" repository. They take the pyramid-model to an extreme, where no 2 people ever work in the same repository. This seems to create a "monarch" mentality - every project is "owned" by one person, rather than an egalitarian set of committers. It worries me.

Jakub Narebski said...

The problem with "get tarball, send patch" approach is that it work only if your change can be expressed well using single patch. If the change should be done (for easy review and for easy bisectability) as a series of patches (series of commits), then access to version control is very much required. Distributed VCS, without need to even ask to create branch, and without showing your experiments to world at large, lower barrier to entry.

Greg Stein said...

@Jakub: agreed. I already gave DVCSs props for better handling of a single patch. When it gets more complicated (as you demonstrate), then a DVCS is even better.

However, I still believe it is no big deal to ask to create a branch and provide commit access. If you're going to participate in a project, then one simple email is no big deal. And hell... if people would get past the social problem, then technical assistance could be done to allow for a simple request/provide access workflow.

And I'm sure you realize that I'm not a fan of the concept "without showing your experiments to the world at large". It is that kind of offline, private development that I believe DVCS encourages to the detriment of Open Source communities and projects.

@sussman: very interesting point about the "each with their own repository". I can't help but believe that is an outgrowth of a mindset: fiefdoms, control, us vs them, etc. A project still has to have one Master repository that releases are cut from, but if all the work is pushed out to N repositories, then how does a newcomer every find out what is going on? How do you track work? How do you do continual review/feedback before the power-plant push of a set of changes? There needs to be a mechanism to point to those N repositories, and that implies centralization. Whether that is GitHub or some other system, there is always centralization.

Jakub Narebski said...

@Greg Stein: With centralized VCS you have (from what you wrote) the choice between "get tarball, send patch" and "ask for commit access in a branch" (and further "full commit access", but it doesn't matter here). You miss a very important workflow: use distributed VCS to prepare series of patches, and send them to mailing list for review. In my opinion public mailing list is much better forum for review than some branch in centralized VCS plus perhaps some 'please review' / 'please pull' request.

@Ben Collins-Sussman: What about suggestion in seminal "The Mythical Man-Month" by Brooks that each group needs intergrator / maintainer; which IMHO in the DVCS world translates to one person, maintainer, applying patches and pulling from lieutenants trees into official repository.

Jakub Narebski said...

'Unpublished branch' (possible only in DVCS) gives ability to incrementally improve patch series, for example changing how it splits into commits, or correcting some error in the patch itself instead of adding fix two patches later. This requires rewriting history, which requires that history is not made public.

Of course it can be taken too far...

jaaron said...

@ben That's an interesting observation and thinking about the projects I've worked with on GitHub, it seems very accurate.

@greg To take your "give anyone a branch who asks for it" suggestion to the extreme there should be a button that allows anyone to create a branch. No email required. Just automate the branch creation completely.

And about the "tracking work" issue in the DVCS world? I think it's a nightmare. When I find a git repository, I'm always wondering where the latest development might be happening. This is particularly a pain when the original author has abandoned the work.

Bill Mill said...

> However, I still believe it is no big deal to ask to create a branch and provide commit access. If you're going to participate in a project, then one simple email is no big deal.

What you have to think about here is the marginal contributor; it really helps to think about this problem in economic terms.

I propose that sending an email to a mailing list when you're not well known on that list asking for commit access is a relatively high-cost action; you put yourself out there to guys you really respect, when you're not even sure that you're going to be able to help them by doing anything useful.

Let's say 1/100 people who will download your source will make useful contributions. If you make the cost to prepare to participate even a *little* high, you're much less likely to get that one than you are otherwise.

The solution? Lower the cost to get the code and hack on it, and focus on reviewing people's contributions.

It's easier to ask forgiveness than permission.

Brian P O'Rourke said...

@Greg I like how you have framed this as a social problem, but I see another situation which your analysis misses:

Many, perhaps most open source projects are dead or at least not actively maintained. Current DVCS tools make it trivial for anyone to pick up and start hacking away at long-forgotten streams of code, and making their changes public.

GitHub makes this process discoverable: looking at some project's mainline, you can also see any forks in the graph and follow the code's activity as it evolves away from its origin.

As the body of open source increases and more projects are orphaned and abandoned, this ability to branch without commit rights will become more important. "Not actively maintained" should not be equivalent to "RIP".

Adam Olsen said...

It's almost a social problem, but not quite.

Subversion mixes two authorities. To do any serious work (which should be done in a branch) you also need authority to unilaterally push new versions. It's akin to giving someone your full banking information simply because they offered to pick up some coffee for you. Not only is it a significant and pointless risk, but having to evaluate and monitor that risk wastes a great deal of time.

The ideal social arrangement involves a lot of patch review, signing off on patch sets, communication, etc. DVCS simply supports that sort of flow, while SVN gets in the way.

(Incidentally, I'm suffering from SVN right now. I started fixing some bugs in an open source game and it's become non-trivial, so I've had to stop.)

dstanek said...

I posted a similar idea on the python-dev list. I was a little less radical in that I wasn't thinking that anyone who wanted access should get it, but I do find the idea interesting.