He's not dead, he's resting
YAML Sucks. Gems Sucks. Syck Sucks.
YAML, like XML (but don’t say that around YAML fans, because they will insist that YAML is nothing like XML), is a faddish structured text format that, by virtue of its generality and abstractness, ends up being harder to work with even with a parser already written than an appropriately designed one-off flat text format.
Gems is Ruby’s way of dealing with distributions with lousy package managers and operating systems where there are no package managers, at the expense of sanity for the minority who could handle it better themselves. To be fair, they are open to this getting fixed; unfortunately, thanks to the immense suckiness of YAML, this is not straightforward.
The metadata for every Gem hosted on RubyForge is available in a really huge YAML file, whose format, rather surprisingly, is properly documented. There’s a standard tool for hosting things this way too, so presumably other repositories could easily do the same thing. This isn’t particularly nice, but it’s a huge improvement over CPAN’s lack of anything consistently useful…
Well, it would be a huge improvement, except the YAML file isn’t YAML.
Let’s run it through libyaml‘s
# Snip zillions of lines of output extlib-0.9.3: !ruby/object:Gem::Specification name: extlib version: !ruby/object:Gem::Version Scanner error: while scanning for the next token at line 343989, column 18 found character that cannot start any token at line 343989, column 18
Oops. Oh well. Let’s run it through yaml-cpp, nominee for the “biggest screw-up of a build system, even taking into account the cmake handicap” award:
Error at line 343989, col 18: unknown token
Mmm. So what’s going on there?
343974 extra-1.0: !ruby/object:Gem::Specification 343975 name: extra 343976 version: !ruby/object:Gem::Version 343977 version: "1.0" 343978 platform: ruby 343979 authors: 343980 - Matthew Harris 343981 autorequire: extra 343982 bindir: bin 343983 cert_chain:  343984 343985 date: 2006-05-10 11:00:00 -04:00 343986 default_executable: 343987 dependencies:  343988 343989 description: `ruby-extra' is a package full of simple/fun/useful methods that are added to the core classes and modules of Ruby. It is quite similar to Facets but is still minimal. 343990 email: firstname.lastname@example.org 343991 executables:  343992 343993 extensions:  343994 343995 extra_rdoc_files:  343996 343997 files:  343998 343999 has_rdoc: true 344000 homepage: http://ruby-extra.rubyforge.org 344001 post_install_message: 344002 rdoc_options:  344003 344004 require_paths: 344005 - lib 344006 required_ruby_version: !ruby/object:Gem::Requirement 344007 requirements: 344008 - - ">" 344009 - !ruby/object:Gem::Version 344010 version: 0.0.0 344011 version: 344012 required_rubygems_version: 344013 requirements:  344014 344015 rubyforge_project: ruby-extra 344016 rubygems_version: 1.3.1 344017 signing_key: 344018 specification_version: 1 344019 summary: Adds useful methods to built-in/core Ruby classes and modules. 344020 test_files: 
Looks like that backtick might be causing problems. According to the specification:
The “@” (#40, at) and “`” (#60, grave accent) are reserved for future use.
So libyaml and yaml-cpp are quite correct in barfing. Wonderful.
Next step: what generates that file, and how can we get it fixed?
Turns out the file is created by serialising a load of
Gem::Specification objects. The serialisation is done by a library called Syck. Syck correctly escapes characters outside the safely printable range, but doesn’t care about
`. Patch time!
But things are never that simple. Syck’s most recent release was in 2005. Even getting at the source is somewhat tricky. According to Syck’s homepage, the source is in CVS. I’ve deliberately purged all knowledge of CVS from my brain, but it looks like the most recent commit was in September 2005.
According to this news item from November 2005, Syck is now in SVN instead. The link given 404s. Hunting around finds this, and from there this. After waiting for three and a half weeks for a
git fetch (because Github is slooooooooooow), it seems that there’s at least some recent activity here.
But this isn’t the Syck used by Ruby. Ruby SVN includes its own copy of Syck. A quick look at the svn log shows that:
- Various fixes in the Syck on Github aren’t in Ruby’s copy.
- Various fixes in Ruby’s copy, including at least one with security implications, aren’t fixed in the Syck on Github.
- Neither has fixed the problem we care about.
So at this point I’m more or less giving up. The Gems YAML file isn’t YAML at all, and can only be parsed by Syck, which is at best badly maintained in at least two different places, with no coordination between the two. Brilliant.