Blag
He's not dead, he's resting
YAML Sucks. Gems Sucks. Syck Sucks.
March 1, 2009
Posted by on YAML, like XML (but don’t say that around YAML fans, because they will insist that YAML is nothing like XML), is a faddish structured text format that, by virtue of its generality and abstractness, ends up being harder to work with even with a parser already written than an appropriately designed one-off flat text format.
Gems is Ruby’s way of dealing with distributions with lousy package managers and operating systems where there are no package managers, at the expense of sanity for the minority who could handle it better themselves. To be fair, they are open to this getting fixed; unfortunately, thanks to the immense suckiness of YAML, this is not straightforward.
The metadata for every Gem hosted on RubyForge is available in a really huge YAML file, whose format, rather surprisingly, is properly documented. There’s a standard tool for hosting things this way too, so presumably other repositories could easily do the same thing. This isn’t particularly nice, but it’s a huge improvement over CPAN’s lack of anything consistently useful…
Well, it would be a huge improvement, except the YAML file isn’t YAML.
Let’s run it through libyaml‘s example-reformatter
:
# Snip zillions of lines of output extlib-0.9.3: !ruby/object:Gem::Specification name: extlib version: !ruby/object:Gem::Version Scanner error: while scanning for the next token at line 343989, column 18 found character that cannot start any token at line 343989, column 18
Oops. Oh well. Let’s run it through yaml-cpp, nominee for the “biggest screw-up of a build system, even taking into account the cmake handicap” award:
Error at line 343989, col 18: unknown token
Mmm. So what’s going on there?
343974 extra-1.0: !ruby/object:Gem::Specification 343975 name: extra 343976 version: !ruby/object:Gem::Version 343977 version: "1.0" 343978 platform: ruby 343979 authors: 343980 - Matthew Harris 343981 autorequire: extra 343982 bindir: bin 343983 cert_chain: [] 343984 343985 date: 2006-05-10 11:00:00 -04:00 343986 default_executable: 343987 dependencies: [] 343988 343989 description: `ruby-extra' is a package full of simple/fun/useful methods that are added to the core classes and modules of Ruby. It is quite similar to Facets but is still minimal. 343990 email: shugotenshi@gmail.com 343991 executables: [] 343992 343993 extensions: [] 343994 343995 extra_rdoc_files: [] 343996 343997 files: [] 343998 343999 has_rdoc: true 344000 homepage: http://ruby-extra.rubyforge.org 344001 post_install_message: 344002 rdoc_options: [] 344003 344004 require_paths: 344005 - lib 344006 required_ruby_version: !ruby/object:Gem::Requirement 344007 requirements: 344008 - - ">" 344009 - !ruby/object:Gem::Version 344010 version: 0.0.0 344011 version: 344012 required_rubygems_version: 344013 requirements: [] 344014 344015 rubyforge_project: ruby-extra 344016 rubygems_version: 1.3.1 344017 signing_key: 344018 specification_version: 1 344019 summary: Adds useful methods to built-in/core Ruby classes and modules. 344020 test_files: []
Looks like that backtick might be causing problems. According to the specification:
The “@” (#40, at) and “`” (#60, grave accent) are reserved for future use.
So libyaml and yaml-cpp are quite correct in barfing. Wonderful.
Next step: what generates that file, and how can we get it fixed?
Turns out the file is created by serialising a load of Gem::Specification
objects. The serialisation is done by a library called Syck. Syck correctly escapes characters outside the safely printable range, but doesn’t care about @
or `
. Patch time!
But things are never that simple. Syck’s most recent release was in 2005. Even getting at the source is somewhat tricky. According to Syck’s homepage, the source is in CVS. I’ve deliberately purged all knowledge of CVS from my brain, but it looks like the most recent commit was in September 2005.
According to this news item from November 2005, Syck is now in SVN instead. The link given 404s. Hunting around finds this, and from there this. After waiting for three and a half weeks for a git fetch
(because Github is slooooooooooow), it seems that there’s at least some recent activity here.
But this isn’t the Syck used by Ruby. Ruby SVN includes its own copy of Syck. A quick look at the svn log shows that:
- Various fixes in the Syck on Github aren’t in Ruby’s copy.
- Various fixes in Ruby’s copy, including at least one with security implications, aren’t fixed in the Syck on Github.
- Neither has fixed the problem we care about.
So at this point I’m more or less giving up. The Gems YAML file isn’t YAML at all, and can only be parsed by Syck, which is at best badly maintained in at least two different places, with no coordination between the two. Brilliant.
Thanks for this post. So would you not use Yaml at all or would you use a different parser, like http://code.google.com/p/yaml-cpp/ and a writer like http://github.com/cesare/ruby-libc-libyaml?
Thank you for your Feedback.
Best
Zeno
I’d just stay clear of Yaml. All the cool kids are using something else now anyway (JSON seems to be the flavour of the month, and comes with its own set of problems), and Yaml never got widely used enough that silly issues like this ever got sorted out.
I’ve yet to see anything any of these languages do better than simple text files…
“I’ve yet to see anything any of these languages do better than simple text files…”
They take away the headache of coming up with a format and parser each time you want to use it.
But I’m sure that isn’t of much concern to a g++-slinging opensores geek.
No, they make you come up with a new format on top of the existing format every time you want to use it, and you have to do more work to write a handler on top of your yaml parser of choice than it takes to write a simple plain text parser from scratch. All yaml does is add in yet another messy and unnecessary layer of complexity.
I avoid YAML because the syntax is too powerful and expressive. YAML is not necessarily simple. When you’re reading it, you may find yourself doing a little extra research into the YAML format. There are too many ways of doing things and the syntax is not always intuitive.
I like some of the features of YAML but the reality is that more often than not YAGNI. JSON or INI will suffice.
Compound that with the fact that the author of Syck disappeared off the face of the planet a few months ago… GOOD TIMES! As fun as it is to play with ruby, it’s crap like this that makes me glad to have moved on in life. Ruby, honey, you’re a hot number, but damn have you got some deep issues! Sorry babe.
I’m sure glad I’m not the only one that feels this way.
Is it bad that I love Ruby but hate Rails and Yaml? Should I be in therapy? LOL