Ciaran McCreesh’s Blag

Now with 17% more caffeine

YAML Sucks. Gems Sucks. Syck Sucks.

YAML, like XML (but don’t say that around YAML fans, because they will insist that YAML is nothing like XML), is a faddish structured text format that, by virtue of its generality and abstractness, ends up being harder to work with even with a parser already written than an appropriately designed one-off flat text format.

Gems is Ruby’s way of dealing with distributions with lousy package managers and operating systems where there are no package managers, at the expense of sanity for the minority who could handle it better themselves. To be fair, they are open to this getting fixed; unfortunately, thanks to the immense suckiness of YAML, this is not straightforward.

The metadata for every Gem hosted on RubyForge is available in a really huge YAML file, whose format, rather surprisingly, is properly documented. There’s a standard tool for hosting things this way too, so presumably other repositories could easily do the same thing. This isn’t particularly nice, but it’s a huge improvement over CPAN’s lack of anything consistently useful…

Well, it would be a huge improvement, except the YAML file isn’t YAML.

Let’s run it through libyaml‘s example-reformatter:

# Snip zillions of lines of output
  extlib-0.9.3: !ruby/object:Gem::Specification
    name: extlib
    version: !ruby/object:Gem::Version
Scanner error: while scanning for the next token at line 343989, column 18
found character that cannot start any token at line 343989, column 18

Oops. Oh well. Let’s run it through yaml-cpp, nominee for the “biggest screw-up of a build system, even taking into account the cmake handicap” award:

Error at line 343989, col 18: unknown token

Mmm. So what’s going on there?

343974   extra-1.0: !ruby/object:Gem::Specification
343975     name: extra
343976     version: !ruby/object:Gem::Version
343977       version: "1.0"
343978     platform: ruby
343979     authors:
343980     - Matthew Harris
343981     autorequire: extra
343982     bindir: bin
343983     cert_chain: []
343984 
343985     date: 2006-05-10 11:00:00 -04:00
343986     default_executable:
343987     dependencies: []
343988 
343989     description: `ruby-extra' is a package full of simple/fun/useful methods that are added to the
             core classes and modules of Ruby. It is quite similar to Facets but is still minimal.
343990     email: shugotenshi@gmail.com
343991     executables: []
343992 
343993     extensions: []
343994 
343995     extra_rdoc_files: []
343996 
343997     files: []
343998 
343999     has_rdoc: true
344000     homepage: http://ruby-extra.rubyforge.org
344001     post_install_message:
344002     rdoc_options: []
344003 
344004     require_paths:
344005     - lib
344006     required_ruby_version: !ruby/object:Gem::Requirement
344007       requirements:
344008       - - ">"
344009         - !ruby/object:Gem::Version
344010           version: 0.0.0
344011       version:
344012     required_rubygems_version:
344013     requirements: []
344014 
344015     rubyforge_project: ruby-extra
344016     rubygems_version: 1.3.1
344017     signing_key:
344018     specification_version: 1
344019     summary: Adds useful methods to built-in/core Ruby classes and modules.
344020     test_files: []

Looks like that backtick might be causing problems. According to the specification:

The “@” (#40, at) and “`” (#60, grave accent) are reserved for future use.

So libyaml and yaml-cpp are quite correct in barfing. Wonderful.

Next step: what generates that file, and how can we get it fixed?

Turns out the file is created by serialising a load of Gem::Specification objects. The serialisation is done by a library called Syck. Syck correctly escapes characters outside the safely printable range, but doesn’t care about @ or `. Patch time!

But things are never that simple. Syck’s most recent release was in 2005. Even getting at the source is somewhat tricky. According to Syck’s homepage, the source is in CVS. I’ve deliberately purged all knowledge of CVS from my brain, but it looks like the most recent commit was in September 2005.

According to this news item from November 2005, Syck is now in SVN instead. The link given 404s. Hunting around finds this, and from there this. After waiting for three and a half weeks for a git fetch (because Github is slooooooooooow), it seems that there’s at least some recent activity here.

But this isn’t the Syck used by Ruby. Ruby SVN includes its own copy of Syck. A quick look at the svn log shows that:

  • Various fixes in the Syck on Github aren’t in Ruby’s copy.
  • Various fixes in Ruby’s copy, including at least one with security implications, aren’t fixed in the Syck on Github.
  • Neither has fixed the problem we care about.

So at this point I’m more or less giving up. The Gems YAML file isn’t YAML at all, and can only be parsed by Syck, which is at best badly maintained in at least two different places, with no coordination between the two. Brilliant.

About these ads

7 responses to “YAML Sucks. Gems Sucks. Syck Sucks.

  1. Zeno Davatz February 12, 2010 at 1:17 pm

    Thanks for this post. So would you not use Yaml at all or would you use a different parser, like http://code.google.com/p/yaml-cpp/ and a writer like http://github.com/cesare/ruby-libc-libyaml?

    Thank you for your Feedback.

    Best
    Zeno

    • Ciaran McCreesh February 13, 2010 at 11:04 am

      I’d just stay clear of Yaml. All the cool kids are using something else now anyway (JSON seems to be the flavour of the month, and comes with its own set of problems), and Yaml never got widely used enough that silly issues like this ever got sorted out.

      I’ve yet to see anything any of these languages do better than simple text files…

      • Earthly February 25, 2010 at 12:26 pm

        “I’ve yet to see anything any of these languages do better than simple text files…”

        They take away the headache of coming up with a format and parser each time you want to use it.
        But I’m sure that isn’t of much concern to a g++-slinging opensores geek.

        • Ciaran McCreesh February 25, 2010 at 12:32 pm

          No, they make you come up with a new format on top of the existing format every time you want to use it, and you have to do more work to write a handler on top of your yaml parser of choice than it takes to write a simple plain text parser from scratch. All yaml does is add in yet another messy and unnecessary layer of complexity.

          • spoot April 22, 2012 at 4:31 pm

            I avoid YAML because the syntax is too powerful and expressive. YAML is not necessarily simple. When you’re reading it, you may find yourself doing a little extra research into the YAML format. There are too many ways of doing things and the syntax is not always intuitive.

            I like some of the features of YAML but the reality is that more often than not YAGNI. JSON or INI will suffice.

  2. Fred Alger May 24, 2010 at 8:19 pm

    Compound that with the fact that the author of Syck disappeared off the face of the planet a few months ago… GOOD TIMES! As fun as it is to play with ruby, it’s crap like this that makes me glad to have moved on in life. Ruby, honey, you’re a hot number, but damn have you got some deep issues! Sorry babe.

  3. terrencecox January 25, 2012 at 4:16 pm

    I’m sure glad I’m not the only one that feels this way.

    Is it bad that I love Ruby but hate Rails and Yaml? Should I be in therapy? LOL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.