Now with 17% more caffeine
Distributed Distribution Development, and Why Git and / or Funtoo is Not It
Gentoo is slowly shuffling towards switching from CVS to Git. This is a good thing, because CVS stinks. Using Git will reduce the amount of time developers need to waste to get something committed, make it easier to apply patches from third parties and make tree-wide changes merely a lot of work rather than practically impossible. What it will not do is make Gentoo in any way more ‘distributed’, ‘decentralised’ or ‘democratic’.
Some of the Git work has already been done, in a reduced manner (no history and no mirroring), by Daniel Robbins’ Funtoo, which is purported to be more distributed than Gentoo. The problem is, there’s nothing there to back up the distributed claim.
Distributed development, in the sense for which Git was designed (and ignoring the intervening BitKeeper stage), meant moving away from having a single central repository off of which everyone worked to having everyone work off their own, publishable repositories and providing easy ways of merging changes from one to another. ‘Good’ changes would tend to find their way from the authors up the food chain to the main repository whence official releases are made. Users requiring things that hadn’t made their way to the top would maintain their own repository, and merge in changes from elsewhere that they needed.
For a conventional codebase, this model works. But it’s not particularly nice, and it’s driven by necessity. You’ll note the big red dots in the diagrams. These represent places where people (assisted to some highly variable degree by Git) have to do merges. I chose big red dots rather than soft fluffy clouds because merges can be a lot of work (and because drawing clouds takes effort).
If you’ve got a conventional codebase, you have to do merges to make use of things from multiple sources — the compiler takes a single codebase and produces a program from it. You can do the same thing with a distribution. Funtoo, for example has had the Sunrise repository merged in to the main repository. Such a change would likely not be possible with Gentoo’s current CVS architecture.
It’s not entirely clear whether Funtoo intends to have users who want to use other overlays merge those overlays into their own tree. Doing so would be more Gitish.
But why bother? There’s no need to have a single codebase — there’s no compiler that has to take every input at once and turn it into a single monolithic product. Those big red dots are unnecessary.
A lot of fashionable programs are moving away from the big monolithic binary model and towards a plugin-assisted architecture. If you want Firefox to do a few things it doesn’t, you don’t hunt around for people who have already written them and then try to merge their source trees together. You install plugins. Only for more severe changes do you have to dive into the source, and the severity of change requiring a trip to the source is gradually increasing.
There’s a reason for this — whilst the merge model is a lot better than a single authoritative codebase and a bunch of patches, it’s a lot more work than providing limited composable extensibility at a higher level.
What, then, would a plugin-based model look like for a Gentoo-like distribution?
Presumably, one would have a centralised ‘main’ codebase. One could then add additional small extras to that main codebase to obtain new functionality (packages, in this case); these extras would rely upon parts of the main codebase and wouldn’t be able to operate on their own. Sound familiar? Yup, overlays are plugins.
This whole “merging overlays into the main tree” thing is starting to look like a step in the wrong direction. What would be some steps in a better direction?
One thing that comes instantly to mind is improving overlay handling. Portage’s overlay handling currently (at least in stable) looks like this:
Portage takes the main Gentoo repository, and then merges it with each overlay in turn, creating one ‘final’ overlay that ends up being used. I’ve used an orange dot here rather than a red one because it’s a different kind of merge. Rather than doing a source-level merge, the orange dot merge more or less (sort of) works like this:
- If there’s a package with the same name and version in the origin and the overlay we’re merging in, take the overlay version.
- If there’s an eclass with the same name and version in the origin and the overlay we’re merging in, sometimes take the overlay version.
- Do some horrid hackery to merge together any colliding profile things in an uncontrolled manner that doesn’t work for more than one merge.
- Pass everything else through.
Now, to be fair, the orange dot merge usually works. Most overlays don’t try to override eclasses, don’t have eclasses that conflict with each other and don’t mess with profiles. For colliding versions, you end up being stuck with a single selected version, which isn’t always so good.
Unfortunately, some overlays do try to override eclasses and profiles, and the result isn’t pretty. You’re ok so long as you only use a single overlay that does this, and so long as any eclass changes aren’t incompatible, but anything beyond that and weird stuff happens.
A less dangerous model would be to make the package manager support multiple repositories. Presumably most overlays wouldn’t want to have to reimplement all the profile and eclass things in the Gentoo repository, so the model would look like this:
Here, repositories, rather than the user, have control over which implementation of eclasses and so on gets used. Paludis uses this model for Gentoo overlays unless told not to.
Unstable Portage, meanwhile, is starting to support controlled masters for eclass merging, but not version handling, which will eventually give:
A multiple repository model is clearly safer than the Portage model, and does away with the manual merges required by the Funtoo model. This gives us:
|Model||Multiple Repositories?||Manual Merges?||Unsafe Automatic Merges?|
I consider the multiple repository model to be better for users even ignoring the merge or conflict issues. Here’s why:
- Users can make selective, rather than all or nothing, use of a repository. It becomes possible to mask the
foo-1.2provided by the
dodgyoverlay, and use the one in the main tree or a different overlay.
- Similarly, users can choose not to use anything from a particular overlay except things they explicitly request.
- It paves the way for handling repositories of different formats.
There aren’t any downsides, either — so long as repositories have user-orderable importance, there’s no loss of functionality.
Finally, I’d like to debunk the myth that the Git model is somehow ‘democratic’. There’s nothing in the least bit democratic about everyone having their own repository. At best, it could be said to be a way of allowing everyone to have their own dictatorship that anyone else can be free to visit — all very well, but when tin pot dictators fall back on old habits it does little to encourage collaboration. A democratic distribution would more likely make use of a special repository which lets people vote on unwritten packages and version bumps — clearly a recipe for disaster, since most people think “I haven’t noticed any bugs” means “stable it instantly”…
The only thing switching Gentoo to Git will solve is the pain of having to use CVS. This alone is enough to make the move worthwhile, but it will do little to nothing to fix Gentoo’s monolithic design and inherently centralised model. Nor does Funtoo’s merge approach solve the problem — on the contrary, it replaces a model where the package manager automatically does unnecessary merging (and sometimes gets things wrong) with a model where people do unnecessary merging (which is a lot of work, and they will still sometimes get things wrong). The future is (or at least should be) in a multi-repository model with good support from the package manager that removes the costs of decentralisation.