Monday, April 26, 2010

WC-NG Changes

In my last post, I described how libsvn_wc had become brittle and hard to manage. The WC-NG process is working to solve that problem, and though we're not yet done, I believe we're on the right path.

The basic question that needed answering is, "where did we go wrong?" While version control is a hard problem (especially if you version directories!), it does not inherently lead to a brittle library. Somewhere, we had gone wrong in the design, the data model, or simply the implementation.

Before I had started working on the problem (almost) two years ago, one of the Subversion developers (Erik Hülsmann, I believe) laid out his thoughts for a next-generation library. In those notes, he postulated on what I now call the Three-Tree Model:
  • the tree you checked out
  • the above tree, plus structural changes (add, delete, move, copy)
  • the above tree, plus content changes (file edits, property edits)
Any working copy operation generally affects one of these trees. svn update and svn switch work on the first tree. svn add and svn merge modify the second tree. Your editor and svn propset affect the last tree.

This was the key insight. In our "wc-1" implementation, the svn_wc_entry_t structure blended all three trees together. Making a change to that structure could have been operating on any of the three trees depending on its flags. Its checksum field could correspond to a checked-out file, or a locally-copied file. To determine, you had to look at the schedule field and the copied field. And hell will rain upon you, should you mess up the flags or forget to check one.

For WC-NG, we have built a new data storage system with an API designed around this three-tree model. This has isolated our storage mechanism behind a solid encapsulation (wc-1 code had too much knowledge of the old "entries" storage model). Operations are now understandable: "copy nodes in the restructuring tree" instead of "set entry->schedule".

This new storage subsystem could produce an entire post on its own. It is radically different from the prior model (a single .svn subdir at the root of the working copy and SQLite-based storage). This is causing huge challenges in upgrades/migrations to the new format, and backwards compatibility for our classic APIs.

Another radical change was our move to using absolute paths to refer to items. The prior model used an "access baton" which implied a relative directory, along with a path relative to that baton. These relative batons and paths caused enormous problems because it led to the question, "relative to what?" In most cases, the answer was "the operating system's current working directory," which is a terrible basis for a deterministic API. In switching to absolute paths, this rendered the access batons obsolete. Since they were a core part of the public API for libsvn_wc (not to mention the widespread internal changes!), this has had a huge impact on the API and its users (such as Subversion's libsvn_client library and its command-line tools).

These two items (data model and absolute paths) are the core changes in WC-NG. The ripple effect from just these two items is immense. We will need to rewrite almost every one of the 40,000 lines of code in the library. And given our incremental approach, many of those will be changed multiple times. We're a solid year into this (although we saw downtime last fall due to our move to the Apache Software Foundation), and we probably have another several months of basic grunt work ahead of us. Stabilization and testing will put our 1.7 release into late summer or possibly this fall.

I could really go on and on about this stuff, but I hope this post provides some basic background on the WC-NG efforts. Please feel free to post any questions (I have no idea what aspects you may want to hear more about!), and I'll work on answering them.

Tuesday, April 13, 2010

What is Subversion's WC-NG?

When I started working on the Subversion project (again) back in August 2008, I wanted to do something that was interesting, technically challenging, and important to the project. For many years, the developers had been complaining about the "working copy" (WC) library. This library was one of the first that we worked on back in June 2000, and had grown (ahem) "organically" over the following eight years. By "organically", I mean it had become a rat's nest of brittle code. Hard to work with, not fun to modify, and difficult as hell to build new features reliably. Over such a lengthy time frame, most actively-developed code tends to end up like this, unless you work real hard against it.

In 2000, we didn't even know all the requirements for the library. Nobody had ever done versioning for directories. Just files. In fact, I think that Subversion may (still) be the only version control system (VCS) out there which treats a directory as a first-class object. It is a very difficult problem, along with being able to work with only pieces of your repository (which leads to "mixed-revision" working copies; something that distributed VCS systems like Git and Mercurial don't have to deal with, much to their enjoyment!).

So we started the library and figured things out as we went. Then it was too slow, so we added stuff to make it work faster. Then we added more features. And revamped some stuff to make it go faster again. More features. And even more.

By this time, the library had become brittle. Adding a feature usually broke something else. There were too many considerations, and internal layering/hiding was not present. Everything could, and did, manipulate a public structure (called svn_wc_entry_t). If you didn't do it right, then something broke. And there was some very deep and hard to understand relationships in the handling of data in that structure. Forward progress was being stifled.

The developers had been talking about fixing the WC library for years, but most of them had other priorities. I had no such baggage, and the WC problem had everything I was looking for: interesting problems to fix, challenging to accomplish, and very important to Subversion's future. Some people had already written up some thoughts on a next generation of the WC library, calling it "WC-NG". After I started digging in, and some other developers joined, the project took on the WC-NG title in earnest and in day-to-day use.

WC-NG is Subversion's name for an entirely new working copy library. We have a new design, and we're incrementally rebuilding the library towards this new design. Due to stringent backwards-compatibility requirements, and the complexity of the system, we cannot simply "rewrite from scratch". This effort is the current focus of our upcoming 1.7 release, and it will provide a Subversion client that will be vastly faster, much more robust and capable, and provide a solid foundation for new features.

In future posts, I'll provide some more detail about WC-NG's design (and how the original WC was broken). I also want to talk about a couple of these new features that will be implemented upon this new foundation. Stay tuned!