Blag

He's not dead, he's resting

UIs for Parallelism

Getting the UI right for parallel execution is tricky.

Make, for example, doesn’t even try. It just displays output from all the children all mixed together, and relies upon commands prefixing their output with a filename or similar identifier. This generally works for what make needs to do, but it’s a bit too crude for some things.

For the secret “not supposed to use it” parallelism in Paludis, we’re currently using something slightly more sophisticated: each line of output from each child job is prefixed by the job’s name. So you’ll see things like:

sync unavailable> wget -T 30 -t 1 -O /var/tmp/paludis-tarsync-cvh9gX/exherbo_repositories.tar.bz2 http://git.exherbo.org/exherbo_repositories.tar.bz2
sync unavailable-unofficial> wget -T 30 -t 1 -O /var/tmp/paludis-tarsync-kvEzRZ/exherbo_unofficial_repositories.tar.bz2
sync unavailable-unofficial> --14:24:06--  http://git.exherbo.org/exherbo_unofficial_repositories.tar.bz2
sync unavailable-unofficial>            => `/var/tmp/paludis-tarsync-kvEzRZ/exherbo_unofficial_repositories.tar.bz2'
sync unavailable> --14:24:06--  http://git.exherbo.org/exherbo_repositories.tar.bz2

Something like this is necessary because the commands involved don’t do their own prefixing. But whilst it’s an improvement, the output’s rather cluttered and it’s hard to work out what’s going on. I’ve been experimenting with something more like this (slowdown induced artificially to make the point):

Starting sync

Repository                    Status                Pending Active  Done
-> alsa                       starting              16      1       0
-> ciaranm                    starting              15      2       0
-> ferdy                      starting              14      3       0
-> compnerd                   starting              13      4       0
-> arbor                      starting              12      5       0
-> compnerd                   success               12      4       1
-> gnome                      starting              11      5       1
-> alsa                       failed                11      4       2
    ... fatal: The remote end hung up unexpectedly
-> hardware                   starting              10      5       2
-> ferdy                      success               10      4       3
-> media                      starting              9       5       3
-> gnome                      success               9       4       4
-> mozilla                    starting              8       5       4
-> hardware                   success               8       4       5
-> pioto-exheres              starting              7       5       5
-> media                      success               7       4       6
-> python                     starting              6       5       6
-> pioto-exheres              success               6       4       7
-> rbrown                     starting              5       5       7
-> mozilla                    success               5       4       8
-> texlive                    starting              4       5       8
-> ciaranm                    active                4       5       8
-> python                     success               4       4       9
-> unavailable                starting              3       5       9
-> texlive                    success               3       4       10
-> unavailable-unofficial     starting              2       5       10
-> rbrown                     success               2       4       11
-> unwritten                  starting              1       5       11
-> arbor                      active                1       5       11
-> unwritten                  success               1       4       12
-> x11                        starting              0       5       12
-> ciaranm                    active                0       5       12
    ... Initialized empty Git repository in /var/db/paludis/repositories/ciaranm/.git/
-> unavailable                active                0       5       12
    ... Literal data: 0 bytes
    ... Matched data: 0 bytes
    ... File list size: 749
    ... File list generation time: 0.001 seconds
    ... File list transfer time: 0.000 seconds
    ... Total bytes sent: 771
    ... Total bytes received: 26
    ...
    ... sent 771 bytes  received 26 bytes  1594.00 bytes/sec
    ... total size is 57557  speedup is 72.22
-> unavailable-unofficial     active                0       5       12
    ... Literal data: 0 bytes
    ... Matched data: 0 bytes
    ... File list size: 110
    ... File list generation time: 0.001 seconds
    ... File list transfer time: 0.000 seconds
    ... Total bytes sent: 132
    ... Total bytes received: 26
    ...
    ... sent 132 bytes  received 26 bytes  316.00 bytes/sec
    ... total size is 4013  speedup is 25.40
-> arbor                      active                0       5       12
-> unavailable                success               0       4       13
-> unavailable-unofficial     success               0       3       14
-> x11                        active                0       3       14
-> ciaranm                    active                0       3       14
-> ciaranm                    success               0       2       15
-> x11                        success               0       1       16
-> arbor                      success               0       0       17

 * Cleaning write cache for ebuild format repositories...
 * Done cleaning write cache for ebuild format repositories

 * No unread news items found

Sync results

* alsa:                       sync of '/var/db/paludis/repositories/alsa' from 'git://git.exherbo.org/demonstrate/failure/alsa.git' failed (paludis::SyncFailedError)
    Log file:                 /var/log/sync-alsa.1225118654.log
* arbor:                      success
* ciaranm:                    success
* compnerd:                   success
* ferdy:                      success
* gnome:                      success
* hardware:                   success
* media:                      success
* mozilla:                    success
* pioto-exheres:              success
* python:                     success
* rbrown:                     success
* texlive:                    success
* unavailable:                success
* unavailable-unofficial:     success
* unwritten:                  success
* x11:                        success

In particular:

  • Output is automatically logged to a file, rather than dumping everything to the screen. That file can be automatically removed if the job succeeds.
  • A summary of what’s going on is displayed every time a job starts or finishes.
  • Every ten seconds (for some value of ten), if a job hasn’t finished, we automatically display the last ten (for some other value of ten) lines of its output (but not any output we’ve already displayed) to the screen, along with a note that it’s still active. We also tail the log if a job fails.

This seems to be a comfortable balance between not showing anything except job statuses (which for jobs that can take an hour to run might well leave you wondering whether it’s hung) and just dumping everything to the screen. Now it’s just a case of finding appropriate values of ten…

Advertisements

3 responses to “UIs for Parallelism

  1. D.J. Capelis October 27, 2008 at 5:38 pm

    I would think that in addition to having the log files you would want to actually dump them to screen on errors.

  2. Ciaran McCreesh October 27, 2008 at 5:42 pm

    The tail gets dumped on errors. The whole thing’s usually way too much though…

  3. Fernando J. Pereda October 28, 2008 at 12:24 pm

    I guess paludis could just give you the path of the log files of those processes that failed. Sometimes you want to see the full log to diagnose a problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s