Monday, November 29, 2010

Open Languages are Not Required

I just posted again to Apache Asserts on Computerworld UK: Open Languages are Not Required.

And please note that I'm speaking primarily to enterprise (internal) software developers, who are the vast majority of developers on the planet. They shouldn't really have to worry about the language that they use for their development. Having an open language is critical for us FLOSS developers, but that is an entirely separate discussion. (hat tip to webmink, to clarify my point here)

Note: the publish date is wrong (says last month); dunno what's up with that.

Update: corrected link after the publish date was fixed.

Friday, October 29, 2010

Are You An Open Source Friend?

The Apache Software Foundation was invited to find some people for Computerworld UK to write for a new blog named "Apache Asserts". Myself and a few others were selected to post our thoughts on open source, the enterprise, and whatever else we may find interesting.

My first post has been published... check it out!

Sunday, August 15, 2010

Android uses Java? Um... no

I've seen a lot of misinformation over the weekend, talking about the Oracle/Google lawsuit. Many of these blog posts and article talks about how "Android uses Java". Heh. That simply isn't true.

Android applications are written using the Java programming language. True. But those applications run on the Dalvik virtual machine. Not the Java virtual machine. Source code is owned/copyright by the author and is entirely unbound from any intellectual property concerns based around the syntax/grammar of that language.

Class libraries? Not Java either. Much of the core libraries come from Apache Harmony, and the rest are libraries that Google wrote. Given that Apache was never provided access to the Java Compatibility Kit, Harmony is not labeled as "Java-certified". Also note that Harmony is a clean-room implementation of the Java class libraries.

So, people: stop saying that Android "uses Java". It doesn't.

(obviously, some of these various components may trample on Oracle's patents; I have no idea, and that is an entirely separate question)

Wednesday, August 04, 2010


Nights out with a friend can be quite interesting. Especially if they are single and "looking". I've found there are generally three possible outcomes with these nights out:

  1. The Cock-Block.
    Your friend is trying to hook up or otherwise get especially friendly with somebody, but you monopolize the "target's" attention in some way to distract them from your friend's intent. Obviously, this outcome is "poor", unless you're some kind of dickhead that doesn't want your friend departing early with the target. Quite selfish, to try and keep them out with you. Of course, there are all sorts of minor rules variants here, that are rather crass: e.g if you're both interested in the target, who steps forward, who holds back? It's simply best to avoid this scenario because it never turns out well.
  2. The Wingman.
    Oh yah. We all know this one... the friend who props up the other and makes them ever more desirable. Talk up their strengths, ensure that the person-of-interest gets excited to know more about your friend. This is the ideal outcome, especially if they make some kind of lasting connection.
  3. The Bus-Tosser.
    This isn't nearly as bad as the Cock-Block, but your friend isn't going to be all that happy with you. At least for a short while. This is where you think your friend is interested in somebody, so you move into Wingman mode. Provide lots of opportunity for the two to talk and hang out, provide some good commentary, etc. Like any good Wingman would do. But afterwards, you find out your friend was not interested. At all. This is the "thanks for throwing me under the bus" maneuver, putting your friend into harms way. Especially if the purported target is interested and giving undue attention to your friend. ... Thankfully, in the long run, this provides lots of laughable material for how you sucked as a myopic Wingman.
I think the best answer all around is to simply go out and have a great time with your friend. Anything that will involve a possible third person can fall into a poor outcome, or simply distract from an awesome evening with a friend.

Wednesday, May 19, 2010

svn stash

This is the third of (at least four) posts in my miniseries about Subversion's next-generation working copy library. See the introduction and what we're doing to fix things.

Once we have this fancy new code, it will provide a stable and robust base for building new features. The DVCS systems have done a great job exploring new areas and needs of version control users. One feature in particular is called stashing or shelving (see "git stash" and "hg shelve").

For those not familiar with the stash concept: consider the scenario where you've been doing some work, and a high-priority bug arrives, needing to be fixed right away. Classically, a Subversion user would check out a fresh working copy, fix the bug, perform the commit, and go back to their work in the original working copy. Instead, when using stash, it takes all of your current work and sets it aside, leaving you with an unchanged working copy, ready for your bug-fix work. After your commit, you retrieve the changes that were stashed. The presumption here, of course, is that stashing is a much faster and simpler operation than setting up a new working copy.

We'll be able implement this feature quite easily using the WC-NG datastore. It will take just a few operations:

  1. preserve all metadata about local changes
  2. place a copy of each locally-modified into pristine storage, recording their SHA-1 key with the stashed metadata
  3. revert all local changes
Since the metadata is recorded in a single SQLite database, step 1 is "simply" some copying of those changes off to a separate set of tables. The pristine storage is a generalized mapping of SHA-1 keys to file contents that we'll be using for storing more things (such as merge sources, pending conflict resolution), so it can easily hold stashed items. And step 3 has been in Subversion for a long time :-)

Recovering the changes from the stash is effectively running a big "svn merge" operation. The merge is required because you may have made other changes to the working copy (your bug-fix) and/or updated to the latest revision.

Other features, such as multiple stashes, management of those changes, applying subsets, and whatnot would be added, too. The feature set has not (yet) been designed, so I have no idea what is required or how we would present this to our users. We'll definitely be looking at git and hg as we explore the needs around stashing/shelving.

"When?" is your next question, I'm sure :-) ... Well, we're releasing WC-NG in Subversion 1.7. That will probably happen this fall. We want to get those changes out the door since that will mark 18 months of development time. WC-NG is a feature in itself, and we want to get it into people's hands without further delays [waiting for additional features]. After that, I'm interested in adding stash support (and a "checkpoint" feature (described in my next post)). So let's say stashing will appear in 1.8 which should be released around this time next year.

Wednesday, May 05, 2010

Heading to Berlin!

In June, in Berlin, elego is hosting a "Subversion Day", along with workshops and a hackathon/sprint. And with great thanks to elego, I will be able to attend and contribute to the event. I'll be in Berlin from June 9th through the 14th.

As always, I'm looking forward to meeting up with my fellow Subversion developers, but there are quite a few others in Berlin that I want to spend time with. Torsten, Valerie, Erik -- I'm looking at you! Julian: road trip from Münster? Who else am I missing? Torsten says that he'll arrange for one of the regular Apache-people dinners. Want to join?

Monday, April 26, 2010

WC-NG Changes

In my last post, I described how libsvn_wc had become brittle and hard to manage. The WC-NG process is working to solve that problem, and though we're not yet done, I believe we're on the right path.

The basic question that needed answering is, "where did we go wrong?" While version control is a hard problem (especially if you version directories!), it does not inherently lead to a brittle library. Somewhere, we had gone wrong in the design, the data model, or simply the implementation.

Before I had started working on the problem (almost) two years ago, one of the Subversion developers (Erik Hülsmann, I believe) laid out his thoughts for a next-generation library. In those notes, he postulated on what I now call the Three-Tree Model:
  • the tree you checked out
  • the above tree, plus structural changes (add, delete, move, copy)
  • the above tree, plus content changes (file edits, property edits)
Any working copy operation generally affects one of these trees. svn update and svn switch work on the first tree. svn add and svn merge modify the second tree. Your editor and svn propset affect the last tree.

This was the key insight. In our "wc-1" implementation, the svn_wc_entry_t structure blended all three trees together. Making a change to that structure could have been operating on any of the three trees depending on its flags. Its checksum field could correspond to a checked-out file, or a locally-copied file. To determine, you had to look at the schedule field and the copied field. And hell will rain upon you, should you mess up the flags or forget to check one.

For WC-NG, we have built a new data storage system with an API designed around this three-tree model. This has isolated our storage mechanism behind a solid encapsulation (wc-1 code had too much knowledge of the old "entries" storage model). Operations are now understandable: "copy nodes in the restructuring tree" instead of "set entry->schedule".

This new storage subsystem could produce an entire post on its own. It is radically different from the prior model (a single .svn subdir at the root of the working copy and SQLite-based storage). This is causing huge challenges in upgrades/migrations to the new format, and backwards compatibility for our classic APIs.

Another radical change was our move to using absolute paths to refer to items. The prior model used an "access baton" which implied a relative directory, along with a path relative to that baton. These relative batons and paths caused enormous problems because it led to the question, "relative to what?" In most cases, the answer was "the operating system's current working directory," which is a terrible basis for a deterministic API. In switching to absolute paths, this rendered the access batons obsolete. Since they were a core part of the public API for libsvn_wc (not to mention the widespread internal changes!), this has had a huge impact on the API and its users (such as Subversion's libsvn_client library and its command-line tools).

These two items (data model and absolute paths) are the core changes in WC-NG. The ripple effect from just these two items is immense. We will need to rewrite almost every one of the 40,000 lines of code in the library. And given our incremental approach, many of those will be changed multiple times. We're a solid year into this (although we saw downtime last fall due to our move to the Apache Software Foundation), and we probably have another several months of basic grunt work ahead of us. Stabilization and testing will put our 1.7 release into late summer or possibly this fall.

I could really go on and on about this stuff, but I hope this post provides some basic background on the WC-NG efforts. Please feel free to post any questions (I have no idea what aspects you may want to hear more about!), and I'll work on answering them.

Tuesday, April 13, 2010

What is Subversion's WC-NG?

When I started working on the Subversion project (again) back in August 2008, I wanted to do something that was interesting, technically challenging, and important to the project. For many years, the developers had been complaining about the "working copy" (WC) library. This library was one of the first that we worked on back in June 2000, and had grown (ahem) "organically" over the following eight years. By "organically", I mean it had become a rat's nest of brittle code. Hard to work with, not fun to modify, and difficult as hell to build new features reliably. Over such a lengthy time frame, most actively-developed code tends to end up like this, unless you work real hard against it.

In 2000, we didn't even know all the requirements for the library. Nobody had ever done versioning for directories. Just files. In fact, I think that Subversion may (still) be the only version control system (VCS) out there which treats a directory as a first-class object. It is a very difficult problem, along with being able to work with only pieces of your repository (which leads to "mixed-revision" working copies; something that distributed VCS systems like Git and Mercurial don't have to deal with, much to their enjoyment!).

So we started the library and figured things out as we went. Then it was too slow, so we added stuff to make it work faster. Then we added more features. And revamped some stuff to make it go faster again. More features. And even more.

By this time, the library had become brittle. Adding a feature usually broke something else. There were too many considerations, and internal layering/hiding was not present. Everything could, and did, manipulate a public structure (called svn_wc_entry_t). If you didn't do it right, then something broke. And there was some very deep and hard to understand relationships in the handling of data in that structure. Forward progress was being stifled.

The developers had been talking about fixing the WC library for years, but most of them had other priorities. I had no such baggage, and the WC problem had everything I was looking for: interesting problems to fix, challenging to accomplish, and very important to Subversion's future. Some people had already written up some thoughts on a next generation of the WC library, calling it "WC-NG". After I started digging in, and some other developers joined, the project took on the WC-NG title in earnest and in day-to-day use.

WC-NG is Subversion's name for an entirely new working copy library. We have a new design, and we're incrementally rebuilding the library towards this new design. Due to stringent backwards-compatibility requirements, and the complexity of the system, we cannot simply "rewrite from scratch". This effort is the current focus of our upcoming 1.7 release, and it will provide a Subversion client that will be vastly faster, much more robust and capable, and provide a solid foundation for new features.

In future posts, I'll provide some more detail about WC-NG's design (and how the original WC was broken). I also want to talk about a couple of these new features that will be implemented upon this new foundation. Stay tuned!

Monday, March 29, 2010

Version Control

Last week, I spent some time in NYC with friends of mine talking about Subversion. The conversion focused around the long-term vision and roadmap. I'll post more on that soon, along with some specific ideas on how I'd like to build some features that other version control (VC) systems have demonstrated as useful and demanded by users.

For this post, I wanted to share a discussion document written by Martin Fowler. This is one of the best, level-headed comparisons between Subversion, Git, and Mercurial. I believe these are the Big Three VC systems that the industry will be using over the next decade, and to see a useful discussion, absent of rhetoric, is very encouraging.

Monday, March 15, 2010

Are Insurers the real Bad Guys?

"In 2009, the largest 14 insurers had profits of roughly $9 billion; that approached 0.4 percent of total health spending of $2.472 trillion. This hardly explains high health costs." -- Robert J. Samuelson, Washington Post.

"the five largest health-insurance companies racked up combined profits of $12.2 billion, up 56 percent over 2008" -- Noam N. Levey, The Seattle Times.

Which is right? But then again... who cares? Either value is small compared to the total costs (given by Samuelson, and Wikipedia). So why ostracize and penalize insurers? They're just the middlemen. What am I missing?

My belief is that a lack of cost/value/benefit feedback is the basic problem. There is no pricing-pushback from the consumer, so the prices escalate without control. (you ever see the prices for toothbrush, toothpaste, or slippers in a hospital stay?) We've seen similar problems with higher education where financial aid covered "any" gap between ability to pay, and the amount requested, so the schools simply requested more.

So why all the rhetoric against insurers? Removing all of their profit will reduce overall health care purchasing by less than one percent. It appears they are a scapegoat, to be blamed in lieu of proper analysis and approaches at reducing health care costs.

Monday, January 25, 2010

Columbia Code Camp, and my Subversion talk

I'm going to be at the Columbia Code Camp this-coming Saturday (January 30th). If you're in the Columbia, SC area, then come find me. Join in the Code Camp, or we can meet in the evening to share some beers.

I'll be talking about the rewrite of the working copy library in Subversion (see the "Rebuilding Subversion's Working Copy Library" on the sessions page).

Wednesday, January 06, 2010

100k apps ... so what?

Why do people keep saying that the iPhone App Store has an "advantage" over others because it has 100,000 applications?

Would it still have an advantage at 90,000? 50,000? How many does it really take?

A month or so ago, Apple wiped out all 1000 applications from a single vendor. Did anybody miss those applications? Probably not. So how many other thousands could be wiped out without taking a hit to its success?

Google stopped putting the "pages indexed" on its front page many years ago because it realized a key principle: the value is in the results, not the quantity.

The Android Market is definitely behind -- it is missing some nice applications. But not many! All the apps that I used to have on my iPhone are now available on my Android phone. Thus, Apple's App Store has zero "advantage" for me. How many other people are like me? Or conversely, how many people want and use all of those 100k applications?

I think the conversation should be rephrased into "do the apps exist, that a typical consumer wants?" rather than focusing on a mere count. That is the success of any app store.