Ciaran McCreesh’s Blag

Now with 17% more caffeine

Archive for March, 2009

Feeding ERB Useful Variables: A Horrible Hack Involving Bindings

Posted by Ciaran McCreesh on March 31, 2009

I’ve been playing around with Ruby to create Summer, a simple web packages thing for Exherbo. Originally I was hand-creating HTML output simply because it’s easy, but that started getting very very messy. Mike convinced me to give ERB a shot.

The problem with template engines with inline code is that they look suspiciously like the braindead PHP model. Content and logic end up getting munged together in a horrid, unmaintainable mess, and the only people who’re prepared to work with it are the kind of people who think PHP isn’t more horrible than an aborted Jacqui Smith clone foetus boiled with rotten lutefisk and served over a bed of raw sewage with a garnish of Dan Brown and Patricia Cornwell novels. So does ERB let us combine easy page layouts with proper separation of code?

Well, sort of. ERB lets you pass it a binding to use for evaluating any code it encounters. On the surface of it, this lets you select between the top level binding, which can only see global symbols, or the caller’s binding, which sees everything in scope at the time. Not ideal; what we want is to provide only a carefully controlled set of symbols.

There are three ways of getting a binding in Ruby: a global TOPLEVEL_BINDING constant, which we clearly don’t want, the Kernel#binding method which returns a binding for the point of call, and the Proc#binding method which returns a binding for the context of a given Proc.

At first glance, the third of these looks most promising. What if we define the names we want to pass through in a lambda, and give it that?

require 'erb'

puts ERB.new("foo <%= bar %>").result(lambda do
    bar = "bar"
end)

Mmm, no, that won’t work:

(erb):1: undefined local variable or method `bar' for main:Object (NameError)

Because the lambda’s symbols aren’t visible to the outside world. What we want is a lambda that has those symbols already defined in its binding:

require 'erb'

puts ERB.new("foo <%= bar %>").result(lambda do
    bar = "bar"
    lambda { }
end.call)

Which is all well and good, but it lets symbols leak through from the outside world, which we’d rather avoid. If we don’t explicitly say “make foo available to ERB”, we don’t want to use the foo that our calling class happens to have defined. We also can’t pass functions through in this way, except by abusing lambdas — and we don’t want to make the ERB code use make_pretty.call(item) rather than make_pretty(item). Back to the drawing board.

There is something that lets us define a (mostly) closed set of names, including functions: a Module. It sounds like we want to pass through a binding saying “execute in the context of this Module” somehow, but there’s no Module#binding_for_stuff_in_us. Looks like we’re screwed.

Except we’re not, because we can make one:

module ThingsForERB
    def self.bar
        "bar"
    end
end

puts ERB.new("foo <%= bar %>").result(ThingsForERB.instance_eval { binding })

Now all that remains is to provide a way to dynamically construct a Module on the fly with methods that map onto (possibly differently-named) methods in the calling context, which is relatively straight-forward, then we can do this in our templates:

<% if summary %>
    <p><%=h summary %>.</p>
<% end %>

<h2>Metadata</h2>

<table class="metadata">
    <% metadata_keys.each do | key | %>
        <tr>
            <th><%=h key.human_name %></th>
            <td><%=key_value key %></td>
        </tr>
    <% end %>
</table>

<h2>Packages</h2>

<table class="packages">
    <% package_names.each do | package_name | %>
        <tr>
            <th><a href="<%=h package_href(package_name) %>"><%=h package_name %></a></th>
            <td><%=h package_summary(package_name) %></td>
        </tr>
    <% end %>
</table>

Which gives us a good clean layout that’s easy to maintain, but lets us keep all the non-trivial code in the controlling class.

Posted in summer | Tagged: , , | 3 Comments »

EAPI 3: A Preview

Posted by Ciaran McCreesh on March 27, 2009

Gentoo is shuffling its way towards EAPI 3. The details haven’t been worked out yet, but there’s a provisional list of things likely to show up that’s mostly been agreed upon. This post will provide a summary; when EAPI 3’s finalised, I’ll do a series of posts with full descriptions as I did for EAPI 2. PMS will remain the definitive definition; I’ve put together a a draft branch (I’ll be rebasing this, so don’t base work off it if you don’t know how to deal with that).

Everything on this list is subject to removal, arbitrary change or nuking from orbit. We’re looking for a finalisation reasonably soon, so if it turns out Portage is unable to support any of these, they’ll be dropped rather than holding the EAPI up.

EAPI 3 will be defined in terms of differences to EAPI 2. These differences may include:

  • pkg_pretend support. This will let ebuilds signal a lot more errors at pretend-time, rather than midway through an install of a hundred packages that you’ve left running overnight. This feature is already in exheres-0.
  • Slot operator dependencies. This will let ebuilds specify what to do when they depend upon a package that has multiple slots available — using :* deps will mean “I can use any slot, and it can change at runtime”, whilst := means “I need the best slot that was there at compile time”. This feature is already in exheres-0 and kdebuild-1.
  • Use dependency defaults. With EAPI 2 use dependencies, it’s illegal to reference a flag in another package unless that package has that flag in IUSE. With use dependency defaults, you’ll be able to use foo/bar[flag(+)] and foo/bar[flag(-)] to mean “pretend it’s enabled (disabled) if it’s not present”. This feature is already in exheres-0.
  • DEFINED_PHASES and PROPERTIES will become mandatory (they’re currently optional). This won’t have any effect for users (although without the former, pkg_pretend would be slooooow).
  • There’s going to be a default src_install of some kind. Details are yet to be entirely worked out.
  • Ebuilds will be able to tell the package manager that it’s ok or not ok to compress certain documentation things using the new docompress function.
  • dodoc will have a -r, for recursively installing directories.
  • doins will support symlinks properly.
  • || ( use? ( ... ) ) will be banned.
  • dohard and dosed will be banned. (Maybe. This one’s still under discussion.)
  • New doexample and doinclude functions. (Again, maybe. Quite a few people think these’re icky and unnecessary.)
  • unpack will support a few new extensions, probably xz, tar.xz and maybe xpi.
  • econf will pass --disable-dependency-tracking --enable-fast-install. This is already done for exheres-0.
  • pkg_info will be usable on uninstalled packages too. This is already in exheres-0 and kdebuild-1.
  • USE and friends will no longer contain arbitrary extra values. (Possibly. Not sure Portage will have this one done in time.)
  • AA and KV will be removed.
  • New REPLACED_BY_VERSION and REPLACING_VERSIONS variables, to let packages work out whether they’re upgrading / downgrading / reinstalling. exheres-0 has a more sophisticated version.
  • The automatic S to WORKDIR fallback will no longer happen under certain conditions. exheres-0 already has this.
  • unpack will consider unrecognised suffixes an error unless --if-compressed is specified, and the default src_unpack will pass this. exheres-0 already has this. (Maybe. Not everyone’s seen the light on this one yet.)
  • The automagic RDEPEND=DEPEND ick will be gone.
  • Utilities will die on failure unless prefixed by nonfatal. exheres-0 already has this.

Unless, of course, something completely different happens.

Posted in eapi 3 | Tagged: , , | Leave a Comment »

Exherbo Over Twice as Stable as Gentoo: A Totally Objective Study

Posted by Ciaran McCreesh on March 11, 2009

Potential users often ask whether Exherbo is stable. To test this, I decided to reinstall everything on my Gentoo desktop and my Exherbo laptop. The results are as follows:

For Exherbo:

Summary of failures:

* net-misc/neon-0.28.3:0::arbor: failure
* dev-perl/IO-Socket-SSL-1.17:0::arbor: failure
* sys-apps/upstart-0.3.9:0::arbor: failure

Total: 390 packages, 387 successes, 0 skipped, 3 failures, 0 unreached

For Gentoo:

Summary of failures:

* sys-devel/flex-2.5.35:0::gentoo: failure
* sys-apps/coreutils-6.10-r2:0::gentoo: failure
* sys-libs/glibc-2.9_p20081201-r2:2.2::gentoo: failure
* dev-util/dejagnu-1.4.4-r1:0::gentoo: failure
* sys-devel/automake-1.9.6-r2:1.9::gentoo: failure
* dev-python/numeric-24.2-r6:0::gentoo: failure
* app-misc/g15daemon-1.9.5.3-r2:0::gentoo: failure
* app-misc/g15mpd-1.0.0:0::gentoo: skipped (dependency '>=app-misc/g15daemon-1.9.0' unsatisfied)
* dev-libs/boost-1.35.0-r2:0::gentoo: failure
* dev-libs/xerces-c-3.0.1:0::gentoo: failure
* dev-libs/xqilla-2.2.0:0::gentoo: skipped (dependency '>=dev-libs/xerces-c-3.0.1' unsatisfied)
* dev-util/bzr-1.12:0::gentoo: failure
* dev-util/mercurial-1.0.2:0::gentoo: failure
* media-video/mplayer-20090226.28734:0::gentoo: failure
* www-servers/lighttpd-1.4.20:0::gentoo: failure
* x11-wm/compiz-0.7.8-r2:0::gentoo: failure

Total: 833 packages, 817 successes, 2 skipped, 14 failures, 0 unreached

From this highly objective and totally fair study, we can conclude that Exherbo ~arch is 99.2% stable, whereas Gentoo ’stable’ is merely 98.1% stable. This, alongside Exherbo’s worryingly disappearing lack of documentation, is an unfortunate trend that must be corrected before things start to get out of hand. I look forward to breaking everything at the earliest available opportunity.

Posted in exherbo, gentoo | Tagged: , | 7 Comments »

Intuitive Packaging is Doing It Wrong

Posted by Ciaran McCreesh on March 10, 2009

Donnie has taken time out of his busy schedule of managing Gentoo to comment on some possible design issues for EAPI 3. He believes that adding support for exheres-0 style DEFAULT_ parameters to ebuilds would result in a less intuitive packaging system, which he considers bad.

Unfortunately, both the term ‘intuitive’ and the conclusion are nonsense. Ebuilds are not intuitive, intuitiveness would not be a useful property for them to have, and allowing parametrisation of default_ functions would not alter any of this.

The only truly intuitive interface is the nipple.
– Jay Vollmer

Let’s look at what intuitive means:

in⋅tu⋅i⋅tive /ɪnˈtuɪtɪv, -ˈtyu-/ [in-too-i-tiv, -tyoo-]
-adjective

  1. perceiving by intuition, as a person or the mind.
  2. perceived by, resulting from, or involving intuition: intuitive knowledge.
  3. having or possessing intuition: an intuitive person.
  4. capable of being perceived or known by intuition.

Ok, let’s look at intuition:

in⋅tu⋅i⋅tion /ˌɪntuˈɪʃən, -tyu-/ [in-too-ish-uhn, -tyoo-]
–noun

  1. direct perception of truth, fact, etc., independent of any reasoning process; immediate apprehension.
  2. a fact, truth, etc., perceived in this way.
  3. a keen and quick insight.
  4. the quality or ability of having such direct perception or quick insight.

So apparently Donnie wants people to be able to write ebuilds without requiring rational thought. Whilst that would go some way towards explaining the state of the tree, it’s evident that ebuilds are not currently intuitive and should not be made intuitive.

What qualities, then, should ebuild design aspire to? Let’s start with these:

  1. Ebuilds should be as obvious as reasonably possible, given the complications of the underlying packaging system and the overall design requirements, to a person with an appropriate level of skill and access to the documentation.
  2. Ebuilds should work to reduce the amount of boilerplate and cut-and-paste duplication required.
  3. Ebuilds should take steps to catch and prevent common errors.

Looking at the first point, one may think it is too weak a requirement — why not “ebuilds should be accessible to your average user”? But then, why should it be?

If you think the average user should have to write ebuilds to be able to get their package manager to track a package they can build by hand — why? Why not simply improve the package manager to be able to track hand-built packages without ebuilds?

If you think the average user should be able to modify ebuilds to add in patches — why? Why not simply improve the package manager to make it easy for the user to add in patches to existing packages?

If you think it will help solve the developer shortage problem — why? There’s no shortage of badly written ebuilds sitting around in bugzilla; making it easier to create more badly written ebuilds won’t fix this. The problem Gentoo faces is how to get more high quality ebuilds, and doing that requires skilled developers who have read and understood the documentation.

Introducing DEFAULT_ parameters has no major effect either way on the first point.

The second and third points are where DEFAULT_ parameters kick in. The reason the default src_configure does something as opposed to nothing is that the something it does is enough for many ebuilds. If instead it were a no-op, a typical simple ebuild would be considerably longer.

Except, these days a lot of ebuilds have a few simple configure options controlled by use flags, so the default src_configure in EAPI 2 (or src_compile in EAPIs 0 and 1) is no good. DEFAULT_ parameters bring this proportion way down.

This brings us to why the default src_install is a no-op. For most packages, something along the lines of “if there’s a Makefile, make DESTDIR="${D}" install” is not enough. For a good proportion of packages, though, that plus an ebuild-supplied list of doc files would suffice.

Donnie claims that specifying things in variables this way is a major change in how ebuilds work. But there are already plenty of examples of things done in this style:

  • The S variable is a declarative parameter to the package manager’s “before we run a phase” functions.
  • Lots of eclasses make use of a DOCS variable.
  • Indeed, nearly all parameterisation of eclasses is done through variables. It could just as easily be done by callback or overridable functions, but developers haven’t opted to do so.

A perfect example of that last point: Donnie’s own x-modular eclass has a variable named PATCHES, which ebuilds set in global scope. If x-modular were using EAPI 3 with a src_prepare and exheres-0 style declarative patches lists, the package manager would already be providing exactly what he’s gone out of his way to implement.

So what gives, Donnie? Do you think your use of PATCHES was a design mistake that you will be correcting? And do you think all those other developers who have been doing the same kind of thing for years are fundamentally wrong?

Posted in eapi 3 | Tagged: , , , , | 7 Comments »

YAML Sucks. Gems Sucks. Syck Sucks.

Posted by Ciaran McCreesh on March 1, 2009

YAML, like XML (but don’t say that around YAML fans, because they will insist that YAML is nothing like XML), is a faddish structured text format that, by virtue of its generality and abstractness, ends up being harder to work with even with a parser already written than an appropriately designed one-off flat text format.

Gems is Ruby’s way of dealing with distributions with lousy package managers and operating systems where there are no package managers, at the expense of sanity for the minority who could handle it better themselves. To be fair, they are open to this getting fixed; unfortunately, thanks to the immense suckiness of YAML, this is not straightforward.

The metadata for every Gem hosted on RubyForge is available in a really huge YAML file, whose format, rather surprisingly, is properly documented. There’s a standard tool for hosting things this way too, so presumably other repositories could easily do the same thing. This isn’t particularly nice, but it’s a huge improvement over CPAN’s lack of anything consistently useful…

Well, it would be a huge improvement, except the YAML file isn’t YAML.

Let’s run it through libyaml’s example-reformatter:

# Snip zillions of lines of output
  extlib-0.9.3: !ruby/object:Gem::Specification
    name: extlib
    version: !ruby/object:Gem::Version
Scanner error: while scanning for the next token at line 343989, column 18
found character that cannot start any token at line 343989, column 18

Oops. Oh well. Let’s run it through yaml-cpp, nominee for the “biggest screw-up of a build system, even taking into account the cmake handicap” award:

Error at line 343989, col 18: unknown token

Mmm. So what’s going on there?

343974   extra-1.0: !ruby/object:Gem::Specification
343975     name: extra
343976     version: !ruby/object:Gem::Version
343977       version: "1.0"
343978     platform: ruby
343979     authors:
343980     - Matthew Harris
343981     autorequire: extra
343982     bindir: bin
343983     cert_chain: []
343984 
343985     date: 2006-05-10 11:00:00 -04:00
343986     default_executable:
343987     dependencies: []
343988 
343989     description: `ruby-extra' is a package full of simple/fun/useful methods that are added to the
             core classes and modules of Ruby. It is quite similar to Facets but is still minimal.
343990     email: shugotenshi@gmail.com
343991     executables: []
343992 
343993     extensions: []
343994 
343995     extra_rdoc_files: []
343996 
343997     files: []
343998 
343999     has_rdoc: true
344000     homepage: http://ruby-extra.rubyforge.org
344001     post_install_message:
344002     rdoc_options: []
344003 
344004     require_paths:
344005     - lib
344006     required_ruby_version: !ruby/object:Gem::Requirement
344007       requirements:
344008       - - ">"
344009         - !ruby/object:Gem::Version
344010           version: 0.0.0
344011       version:
344012     required_rubygems_version:
344013     requirements: []
344014 
344015     rubyforge_project: ruby-extra
344016     rubygems_version: 1.3.1
344017     signing_key:
344018     specification_version: 1
344019     summary: Adds useful methods to built-in/core Ruby classes and modules.
344020     test_files: []

Looks like that backtick might be causing problems. According to the specification:

The “@” (#40, at) and “`” (#60, grave accent) are reserved for future use.

So libyaml and yaml-cpp are quite correct in barfing. Wonderful.

Next step: what generates that file, and how can we get it fixed?

Turns out the file is created by serialising a load of Gem::Specification objects. The serialisation is done by a library called Syck. Syck correctly escapes characters outside the safely printable range, but doesn’t care about @ or `. Patch time!

But things are never that simple. Syck’s most recent release was in 2005. Even getting at the source is somewhat tricky. According to Syck’s homepage, the source is in CVS. I’ve deliberately purged all knowledge of CVS from my brain, but it looks like the most recent commit was in September 2005.

According to this news item from November 2005, Syck is now in SVN instead. The link given 404s. Hunting around finds this, and from there this. After waiting for three and a half weeks for a git fetch (because Github is slooooooooooow), it seems that there’s at least some recent activity here.

But this isn’t the Syck used by Ruby. Ruby SVN includes its own copy of Syck. A quick look at the svn log shows that:

  • Various fixes in the Syck on Github aren’t in Ruby’s copy.
  • Various fixes in Ruby’s copy, including at least one with security implications, aren’t fixed in the Syck on Github.
  • Neither has fixed the problem we care about.

So at this point I’m more or less giving up. The Gems YAML file isn’t YAML at all, and can only be parsed by Syck, which is at best badly maintained in at least two different places, with no coordination between the two. Brilliant.

Posted in programming, ruby | Tagged: , , , | Leave a Comment »