The basic question that needed answering is, "where did we go wrong?" While version control is a hard problem (especially if you version directories!), it does not inherently lead to a brittle library. Somewhere, we had gone wrong in the design, the data model, or simply the implementation.
Before I had started working on the problem (almost) two years ago, one of the Subversion developers (Erik Hülsmann, I believe) laid out his thoughts for a next-generation library. In those notes, he postulated on what I now call the Three-Tree Model:
- the tree you checked out
- the above tree, plus structural changes (add, delete, move, copy)
- the above tree, plus content changes (file edits, property edits)
This was the key insight. In our "wc-1" implementation, the svn_wc_entry_t structure blended all three trees together. Making a change to that structure could have been operating on any of the three trees depending on its flags. Its checksum field could correspond to a checked-out file, or a locally-copied file. To determine, you had to look at the schedule field and the copied field. And hell will rain upon you, should you mess up the flags or forget to check one.
For WC-NG, we have built a new data storage system with an API designed around this three-tree model. This has isolated our storage mechanism behind a solid encapsulation (wc-1 code had too much knowledge of the old "entries" storage model). Operations are now understandable: "copy nodes in the restructuring tree" instead of "set entry->schedule".
This new storage subsystem could produce an entire post on its own. It is radically different from the prior model (a single .svn subdir at the root of the working copy and SQLite-based storage). This is causing huge challenges in upgrades/migrations to the new format, and backwards compatibility for our classic APIs.
Another radical change was our move to using absolute paths to refer to items. The prior model used an "access baton" which implied a relative directory, along with a path relative to that baton. These relative batons and paths caused enormous problems because it led to the question, "relative to what?" In most cases, the answer was "the operating system's current working directory," which is a terrible basis for a deterministic API. In switching to absolute paths, this rendered the access batons obsolete. Since they were a core part of the public API for libsvn_wc (not to mention the widespread internal changes!), this has had a huge impact on the API and its users (such as Subversion's libsvn_client library and its command-line tools).
These two items (data model and absolute paths) are the core changes in WC-NG. The ripple effect from just these two items is immense. We will need to rewrite almost every one of the 40,000 lines of code in the library. And given our incremental approach, many of those will be changed multiple times. We're a solid year into this (although we saw downtime last fall due to our move to the Apache Software Foundation), and we probably have another several months of basic grunt work ahead of us. Stabilization and testing will put our 1.7 release into late summer or possibly this fall.
I could really go on and on about this stuff, but I hope this post provides some basic background on the WC-NG efforts. Please feel free to post any questions (I have no idea what aspects you may want to hear more about!), and I'll work on answering them.