Ciaran McCreesh’s Blag

Now with 17% more caffeine

Tag Archives: ruby

There’s something horribly wrong with Gentoo

After a couple of months of it being switched off, the PSU in my Gentoo box decided to let its magic smoke escape. I made the mistake of replacing it, and now I see this:

* virtual/rubygems
    ::gentoo                  1 {:ruby18} 2 {:jruby} 3* {:ree18} (4)KR {:ruby19} (5)K {:rbx}
    virtual/rubygems-3:ree18::gentoo
    Description               Virtual ebuild for rubygems
    Homepage                  
    Herds                     ruby
    Use flags                 
        ruby_targets          
            (ree18)           Build with Ruby Enterprise Edition 1.8.x

That someone thought that that was in any way a good idea pretty much sums up everything that’s wrong with Gentoo. I’m seriously considering just abandoning Gentoo support now — there comes a point when the accumulated bad design decisions would cost more to fix than the total worth of the product, and this plus the Python eclass are suggesting to me that Gentoo has passed that point.

Feeding ERB Useful Variables: A Horrible Hack Involving Bindings

I’ve been playing around with Ruby to create Summer, a simple web packages thing for Exherbo. Originally I was hand-creating HTML output simply because it’s easy, but that started getting very very messy. Mike convinced me to give ERB a shot.

The problem with template engines with inline code is that they look suspiciously like the braindead PHP model. Content and logic end up getting munged together in a horrid, unmaintainable mess, and the only people who’re prepared to work with it are the kind of people who think PHP isn’t more horrible than an aborted Jacqui Smith clone foetus boiled with rotten lutefisk and served over a bed of raw sewage with a garnish of Dan Brown and Patricia Cornwell novels. So does ERB let us combine easy page layouts with proper separation of code?

Well, sort of. ERB lets you pass it a binding to use for evaluating any code it encounters. On the surface of it, this lets you select between the top level binding, which can only see global symbols, or the caller’s binding, which sees everything in scope at the time. Not ideal; what we want is to provide only a carefully controlled set of symbols.

There are three ways of getting a binding in Ruby: a global TOPLEVEL_BINDING constant, which we clearly don’t want, the Kernel#binding method which returns a binding for the point of call, and the Proc#binding method which returns a binding for the context of a given Proc.

At first glance, the third of these looks most promising. What if we define the names we want to pass through in a lambda, and give it that?

require 'erb'

puts ERB.new("foo <%= bar %>").result(lambda do
    bar = "bar"
end)

Mmm, no, that won’t work:

(erb):1: undefined local variable or method `bar' for main:Object (NameError)

Because the lambda’s symbols aren’t visible to the outside world. What we want is a lambda that has those symbols already defined in its binding:

require 'erb'

puts ERB.new("foo <%= bar %>").result(lambda do
    bar = "bar"
    lambda { }
end.call)

Which is all well and good, but it lets symbols leak through from the outside world, which we’d rather avoid. If we don’t explicitly say “make foo available to ERB”, we don’t want to use the foo that our calling class happens to have defined. We also can’t pass functions through in this way, except by abusing lambdas — and we don’t want to make the ERB code use make_pretty.call(item) rather than make_pretty(item). Back to the drawing board.

There is something that lets us define a (mostly) closed set of names, including functions: a Module. It sounds like we want to pass through a binding saying “execute in the context of this Module” somehow, but there’s no Module#binding_for_stuff_in_us. Looks like we’re screwed.

Except we’re not, because we can make one:

module ThingsForERB
    def self.bar
        "bar"
    end
end

puts ERB.new("foo <%= bar %>").result(ThingsForERB.instance_eval { binding })

Now all that remains is to provide a way to dynamically construct a Module on the fly with methods that map onto (possibly differently-named) methods in the calling context, which is relatively straight-forward, then we can do this in our templates:

<% if summary %>
    <p><%=h summary %>.</p>
<% end %>

<h2>Metadata</h2>

<table class="metadata">
    <% metadata_keys.each do | key | %>
        <tr>
            <th><%=h key.human_name %></th>
            <td><%=key_value key %></td>
        </tr>
    <% end %>
</table>

<h2>Packages</h2>

<table class="packages">
    <% package_names.each do | package_name | %>
        <tr>
            <th><a href="<%=h package_href(package_name) %>"><%=h package_name %></a></th>
            <td><%=h package_summary(package_name) %></td>
        </tr>
    <% end %>
</table>

Which gives us a good clean layout that’s easy to maintain, but lets us keep all the non-trivial code in the controlling class.

YAML Sucks. Gems Sucks. Syck Sucks.

YAML, like XML (but don’t say that around YAML fans, because they will insist that YAML is nothing like XML), is a faddish structured text format that, by virtue of its generality and abstractness, ends up being harder to work with even with a parser already written than an appropriately designed one-off flat text format.

Gems is Ruby’s way of dealing with distributions with lousy package managers and operating systems where there are no package managers, at the expense of sanity for the minority who could handle it better themselves. To be fair, they are open to this getting fixed; unfortunately, thanks to the immense suckiness of YAML, this is not straightforward.

The metadata for every Gem hosted on RubyForge is available in a really huge YAML file, whose format, rather surprisingly, is properly documented. There’s a standard tool for hosting things this way too, so presumably other repositories could easily do the same thing. This isn’t particularly nice, but it’s a huge improvement over CPAN’s lack of anything consistently useful…

Well, it would be a huge improvement, except the YAML file isn’t YAML.

Let’s run it through libyaml‘s example-reformatter:

# Snip zillions of lines of output
  extlib-0.9.3: !ruby/object:Gem::Specification
    name: extlib
    version: !ruby/object:Gem::Version
Scanner error: while scanning for the next token at line 343989, column 18
found character that cannot start any token at line 343989, column 18

Oops. Oh well. Let’s run it through yaml-cpp, nominee for the “biggest screw-up of a build system, even taking into account the cmake handicap” award:

Error at line 343989, col 18: unknown token

Mmm. So what’s going on there?

343974   extra-1.0: !ruby/object:Gem::Specification
343975     name: extra
343976     version: !ruby/object:Gem::Version
343977       version: "1.0"
343978     platform: ruby
343979     authors:
343980     - Matthew Harris
343981     autorequire: extra
343982     bindir: bin
343983     cert_chain: []
343984 
343985     date: 2006-05-10 11:00:00 -04:00
343986     default_executable:
343987     dependencies: []
343988 
343989     description: `ruby-extra' is a package full of simple/fun/useful methods that are added to the
             core classes and modules of Ruby. It is quite similar to Facets but is still minimal.
343990     email: shugotenshi@gmail.com
343991     executables: []
343992 
343993     extensions: []
343994 
343995     extra_rdoc_files: []
343996 
343997     files: []
343998 
343999     has_rdoc: true
344000     homepage: http://ruby-extra.rubyforge.org
344001     post_install_message:
344002     rdoc_options: []
344003 
344004     require_paths:
344005     - lib
344006     required_ruby_version: !ruby/object:Gem::Requirement
344007       requirements:
344008       - - ">"
344009         - !ruby/object:Gem::Version
344010           version: 0.0.0
344011       version:
344012     required_rubygems_version:
344013     requirements: []
344014 
344015     rubyforge_project: ruby-extra
344016     rubygems_version: 1.3.1
344017     signing_key:
344018     specification_version: 1
344019     summary: Adds useful methods to built-in/core Ruby classes and modules.
344020     test_files: []

Looks like that backtick might be causing problems. According to the specification:

The “@” (#40, at) and “`” (#60, grave accent) are reserved for future use.

So libyaml and yaml-cpp are quite correct in barfing. Wonderful.

Next step: what generates that file, and how can we get it fixed?

Turns out the file is created by serialising a load of Gem::Specification objects. The serialisation is done by a library called Syck. Syck correctly escapes characters outside the safely printable range, but doesn’t care about @ or `. Patch time!

But things are never that simple. Syck’s most recent release was in 2005. Even getting at the source is somewhat tricky. According to Syck’s homepage, the source is in CVS. I’ve deliberately purged all knowledge of CVS from my brain, but it looks like the most recent commit was in September 2005.

According to this news item from November 2005, Syck is now in SVN instead. The link given 404s. Hunting around finds this, and from there this. After waiting for three and a half weeks for a git fetch (because Github is slooooooooooow), it seems that there’s at least some recent activity here.

But this isn’t the Syck used by Ruby. Ruby SVN includes its own copy of Syck. A quick look at the svn log shows that:

  • Various fixes in the Syck on Github aren’t in Ruby’s copy.
  • Various fixes in Ruby’s copy, including at least one with security implications, aren’t fixed in the Syck on Github.
  • Neither has fixed the problem we care about.

So at this point I’m more or less giving up. The Gems YAML file isn’t YAML at all, and can only be parsed by Syck, which is at best badly maintained in at least two different places, with no coordination between the two. Brilliant.

Stopping Pythons from eating your Rams

As various people have found out the hard way, and much to my annoyance because my laptop is memory starved, building Paludis can sometimes take lots and lots of RAM.

Originally, we didn’t do anything about this. But unfortunately lots of users have silly things like MAKEOPTS="-j9", which can result in the build process wanting somewhere in the region of eight gigs of RAM, which in turn leads to users whining about gcc internal errors or random processes being OOMed. So we stuck a nasty hack in the ebuild that would reduce MAKEOPTS based upon how much free memory you had — all very well, but it screws over distcc users and isn’t even necessary for many combinations of USE flags and CXXFLAGS.

It’s worth working out exactly what makes the compiler memory usage so high. It’s fairly easy to figure out that it’s only an issue when building the Python bindings. We use Boost.Python for these, and unfortunately Boost will quite happily use horrible preprocessor hacks that result in huge generated source files and all sorts of nasty workarounds just to get code to work on ancient unsupported Microsoft compilers. It’s enough of a problem for enough people that there’s a tutorial section on reducing memory consumption for Boost.Python, but we already do those things.

There’s something else interesting, though. With debugging turned on (which autotools does by default), we need something like 800MBytes to compile one particular Python binding file. With it turned off, we only need 300MBytes, which is much less likely to be a problem (and more to the point it won’t make my laptop start swapping). It turns out that building the Python bindings with -g isn’t even useful — gdb doesn’t give particularly useful backtraces on the Python interpreter, and there are better ways of tracking down problems there.

So it looks like it makes sense to add -g0 (after checking that the compiler supports it) to CXXFLAGS for the Python bindings. Easy enough, right?

Wrong. As with everything else involving autotools, we have to jump through all sorts of convoluted hoops to do this. CXXFLAGS is a user variable (so we aren’t supposed to change it), and it takes precedence over AM_CXXFLAGS (which we can change). There’s no ‘more important than the user variable’ variable, and we can’t sensibly override CXXCOMPILE, so this gets messy. We have to abuse configure.ac to remove the debugging options from the user’s CXXFLAGS and move them into something that ends up in AM_CXXFLAGS, which we can then override. Horrid.

The next Paludis release will include this voodoo, which should improve things considerably and let us avoid the nasty MAKEOPTS mangling. But it’s still not ideal.

Most Gentoo users have USE="python" set, either from profiles or explicitly. Most of these users do not want to build the Python bindings. Some of these users don’t think to look at the dependencies before moaning that Paludis requires Boost, so they don’t even realise it’s only because they’re using a USE flag they probably don’t want enabled. So what can we do about this?

We can’t use IUSE defaults, since we don’t really want to use anything later than EAPI 0 for package manager ebuilds. We could turn off python selectively in package.use, but lots of users still have USE="python" set explicitly. So we use a different USE flag name. We’ve gone for python-bindings, along with a use description that makes it clear that thinking “well I have some things that use Python so I probably want this flag on” is wrong.

For consistency, we’ve also renamed ruby to ruby-bindings. These are a lot more useful than the Python bindings (playman is written in Ruby), and a lot faster to build thanks to Ruby having a reasonably sane API, so we might end up having to mess with profiles to turn them on by default.

I might end up reverting all of this if it turns out it does more harm than good. We’ll see.

Recursive Lambdas in Ruby using Object#tap

Paludis represents an ebuild’s homepage as a dependency-style heirarchy, since PMS allows use-conditional blocks like:

HOMEPAGE="http://example.org/foo gtk? ( http://example.org/foo-gtk )"

Given this, we want a quick way of extracting the URLs using the Ruby bindings. One could of course use a function:

def extract_homepage_recursively(spec)
    case spec
    when AllDepSpec, ConditionalDepSpec
        spec.each { | child | extract_homepage_recursively(child) }
    when SimpleURIDepSpec
        puts spec
    end
end

extract_homepage_recursively(id.homepage_key.value) if id.homepage_key

But that’s rather crude. It would be much nicer to use a lambda, since we don’t need a new name for something we’re only using once.

Unfortunately, recursive lambdas are sometimes rather pesky. There are various solutions, most of which involve passing the lambda as a parameter to itself. In the general case, there’s the infamous Y combinator, but since Ruby has language-level recognition for recursion there’s no need to resort to that kind of silliness. We could just use a variable:

recurse = lambda do | recurse, spec |
    case spec
    when AllDepSpec, ConditionalDepSpec
        spec.each { | child | recurse.call(recurse, child) }
    when SimpleURIDepSpec
        puts spec
    end
end

recurse.call(recurse, id.homepage_key.value) if id.homepage_key

But that’s still a pointless waste of a name. We can do better than that.

Ruby 1.9 adds an Object#tap method, which is rather nifty. Ruby 1.8 doesn’t have it, but we can provide it easily:

if not Object.respond_to? :tap
    class Object
        def tap
            yield self
            self
        end
    end
end

Then, we don’t need a variable or a horrid untyped lambda calculus construct at all:

lambda do | recurse, spec |
    case spec
    when AllDepSpec, ConditionalDepSpec
        spec.each { | child | recurse.call(recurse, child) }
    when SimpleURIDepSpec
        puts spec
    end
end.tap { | r | r.call(r, id.homepage_key.value) } if id.homepage_key

Paludis Ruby Bindings and Template Classes

A PackageID in Paludis supports various actions. An Action is represented by a subclass instance, such as InstallAction or ConfigAction, some of which carry member data (for example, an InstallAction carries information about the target repository for the install).

To perform an action, the PackageID::perform_action method is used. But not all IDs support all actions — you can’t, for example, uninstall a package that isn’t installed, and not all EAPIs support the ‘pretend’ action. So there’s a second method, PackageID::supports_action, that returns a bool saying whether an action is supported.

That’s all very well, but it would require constructing an Action subclass instance just for querying purposes. There’s not much point in this. So we make PackageID::supports_action take a SupportsActionTest<SomeAction> parameter rather than an Action. Unlike the base Action subclass, the SupportsActionTest<T_> template class carries no member data, so it doesn’t need fancy construction. This lets us do this:

if (my_package_id->supports_action(SupportsActionTest<FetchAction>()))
{
    FetchAction fetch_action(_imp->fetch_options);
    my_package_id->perform_action(fetch_action);
}

This pattern crops up in two other places. To speed up certain queries, we can ask a Repository whether some of its IDs might support a particular action. The Repository::some_ids_might_support_action method will always return true if any of its IDs support a particular action, and might return false if it’s known for sure that none of them will (this weasel wording is necessary because we might, for example, have a repository full of ebuilds with unsupported EAPIs, and unsupported EAPIs means no actions are possible).

Similarly, the new filter system has filter::SupportsAction<ActionClass>. The implementation of this filter uses the Repository and PackageID methods to be as lazy as possible.

Which is all well and good, in C++, but with bindings things get a bit icky since templates don’t translate naturally. In Ruby, we used to have a bunch of classes. SupportsInstallActionTest.new() would be like SupportsActionTest<InstallAction>(), SupportsConfigActionTest.new() would be like SupportsActionTest<ConfigAction> and so on. This isn’t particularly nice.

It occurred to me that SupportsActionTest.new(InstallAction) is legal syntactically in Ruby. It passes the value InstallAction, which is a variable of class Class, as the parameter. Then we have to screw around a bit in the bindings code, remembering that SomeClass <= OtherClass in Ruby means “SomeClass is or is a subclass of OtherClass“:

/*
 * Document-method: SupportsActionTest.new
 *
 * call-seq:
 *     SupportsActionTest.new(ActionClass) -> SupportsActionTest
 *
 * Create new SupportsActionTest object. The ActionClass should be, e.g. InstallAction.
 */
static VALUE
supports_action_test_new(VALUE self, VALUE action_class)
{
    std::tr1::shared_ptr<const SupportsActionTestBase> * ptr(0);

    try
    {
        if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, install_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<InstallAction>()));
        else if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, installed_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<InstalledAction>()));
        else if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, uninstall_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<UninstallAction>()));
        else if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, pretend_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<PretendAction>()));
        else if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, config_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<ConfigAction>()));
        else if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, fetch_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<FetchAction>()));
        else if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, info_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<InfoAction>()));
        else if (Qtrue == rb_funcall2(action_class, rb_intern("<="), 1, pretend_fetch_action_value_ptr()))
            ptr = new std::tr1::shared_ptr<const SupportsActionTestBase>(make_shared_ptr(new SupportsActionTest<PretendFetchAction>()));
        else
            rb_raise(rb_eTypeError, "Can't convert %s into an Action subclass", rb_obj_classname(action_class));

        VALUE tdata(Data_Wrap_Struct(self, 0, &Common<std::tr1::shared_ptr<const SupportsActionTestBase> >::free, ptr));
        rb_obj_call_init(tdata, 0, &self);
        return tdata;
    }
    catch (const std::exception & e)
    {
        delete ptr;
        exception_to_ruby_exception(e);
    }
}

/*
 * Document-class: Paludis::SupportsActionTest
 *
 * Tests whether a Paludis::PackageID supports a particular action.
 */
c_supports_action_test = rb_define_class_under(paludis_module(), "SupportsActionTest", rb_cObject);
rb_define_singleton_method(c_supports_action_test, "new", RUBY_FUNC_CAST(&supports_action_test_new), 1);

Not exactly pretty, for now. Hopefully when C++0x comes along we’ll be able to invent some obscene hack involving std::initializer_list, lambdas and std::find_if which will make the whole thing somewhat more elegant.

Unfortunately, the Python bindings are still stuck using package_id.supports_action(SupportsFetchActionTest()) etc. So far as I can see there’s no nice way to get the same effect whilst using boost.python, even though Python lets you pass classes around in a similar manner to Ruby.

Follow

Get every new post delivered to your Inbox.