Ciaran McCreesh’s Blag

Now with 17% more caffeine

This Week in Python Stupidity: os.stat, os.utime and Sub-Second Timestamps

The primary design principle behind the Python programming language is to take everything that’s horrible and wrong with Perl and get it horrible and wrong in a completely different and even more hideous way. Today, however, we shall be looking at a particularly egregious case of stupidity the likes of which not even PHP has managed to replicate.

On Unix, timestamps have traditionally been held as an integer number of seconds since the epoch. The modification time for a file is one place such a timestamp has been used. Two groups of system calls are of interest to us here.

First, stat (and its fstat and lstat variants). The stat system call places information about a file into a struct also named stat (which is possible thanks to a lesser case of brain damage in C’s design). To get the mtime of a file, historically we would have used the st_mtime field, which is of type time_t, which is an integer of some kind:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char * argv[])
{
    struct stat s;
    if (-1 == stat("timmy", &s))
        return EXIT_FAILURE;

    printf("stat.st_mtime for timmy is %ld\n", s.st_mtime);
    return EXIT_SUCCESS;
}

Sometimes we might want to modify a file, but not affect its mtime. Thus, we need a way to set a file’s mtime to a given value, and to do this we would historically have used a function from the utime family:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
#include <utime.h>
#include <fcntl.h>

int main(int argc, char * argv[])
{
    struct stat s;
    if (-1 == stat("timmy", &s))
        return EXIT_FAILURE;

    int fd;
    fd = open("timmy", O_WRONLY, O_TRUNC | O_CREAT);
    if (-1 == fd)
        return EXIT_FAILURE;
    if (0 != close(fd))
        return EXIT_FAILURE;

    struct utimbuf times = { .actime = s.st_atime, .modtime = s.st_mtime };
    if (-1 == utime("timmy", &times))
        return EXIT_FAILURE;

    return EXIT_SUCCESS;
}

(Sidenote: the above almost certainly should be using fstat and futimes instead to avoid race conditions, but this is irrelevant for our examples.)

But all of this operates only on a second-precision basis. For many applications this is no longer sufficient. Fortunately, some kernels and filesystems now support nanosecond-resolution timestamps.

First, for the stat family: rather than using st_mtime, we now use st_mtim, which is a struct timespec:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char * argv[])
{
    struct stat s;
    if (-1 == stat("timmy", &s))
        return EXIT_FAILURE;

    printf("stat.st_mtim for timmy is %lds %ldns\n",
            s.st_mtim.tv_sec, s.st_mtim.tv_nsec);
    return EXIT_SUCCESS;
}

And if our filesystem supports it, we get something like:

$ ./mtimens 
stat.st_mtim for timmy is 1258321672s 173919603ns

As we can see, running our old utime-using code preserves the seconds but not the nanoseconds:

$ touch timmy 
$ ./mtimens 
stat.st_mtim for timmy is 1258321978s 62671870ns
$ ./utime
$ ./mtimens 
stat.st_mtim for timmy is 1258321978s 0ns

To modify preserving nanoseconds, we use either utimensat or futimens:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
#include <utime.h>
#include <fcntl.h>

int main(int argc, char * argv[])
{
    struct stat s;
    if (-1 == stat("timmy", &s))
        return EXIT_FAILURE;

    int fd;
    fd = open("timmy", O_WRONLY, O_TRUNC | O_CREAT);
    if (-1 == fd)
        return EXIT_FAILURE;
    if (0 != close(fd))
        return EXIT_FAILURE;

    struct timespec times[2] = { s.st_atim, s.st_mtim };
    if (-1 == utimensat(AT_FDCWD, "timmy", times, 0))
        return EXIT_FAILURE;

    return EXIT_SUCCESS;
}

And now it works as expected:

$ touch timmy
$ ./mtimens 
stat.st_mtim for timmy is 1258322326s 852774523ns
$ ./utimens 
$ ./mtimens 
stat.st_mtim for timmy is 1258322326s 852774523ns

Incidentally, POSIX.1-2008 considers the non-nanosecond-resolution functions and members to be deprecated, although since the nanosecond resolution functions aren’t universally available yet, a certain amount of autovoodoo is generally required…

Now we shall look at some Python. First, the old way:

import os

s = os.stat("timmy")

f = open("timmy", "w+")
f.close()

os.utime("timmy", (s.st_atime, s.st_mtime))

Now, to see if we can guess how the new way works:

>>> import os
>>> os.stat("timmy").st_mtim
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'posix.stat_result' object has no attribute 'st_mtim'

Mmm, nope. Time to consult the documentation. Nothing under stat, but there’s something interesting called stat_float_times:

stat_float_times([newvalue])

Determine whether stat_result represents time stamps as float objects. If newvalue is True, future calls to stat() return floats, if it is False, future calls return ints. If newvalue is omitted, return the current setting.

Uh oh. This can’t be good. Let’s look more closely at what happens when we run our code that uses stat.st_mtime and os.utime:

$ touch timmy 
$ ./mtimens 
stat.st_mtim for timmy is 1258324320s 762942258ns
$ python utime.py 
$ ./mtimens 
stat.st_mtim for timmy is 1258324320s 762942000ns
$ ./utimens 12345678901 111111111
$ ./mtimens 
stat.st_mtim for timmy is 12345678901s 111111111ns
$ python utime.py 
$ ./mtimens 
stat.st_mtim for timmy is 12345678901s 111110000ns

What’s that, Lassie? Timmy has lost several significant digits of its sub-second mtime? Oh noes!

Yup, that’s right, Python’s underlying type for floats is an IEEE 754 double, which is only good for about sixteen decimal digits. With ten digits before the decimal point, that leaves six for sub-second resolutions, which is three short of the range required to preserve POSIX nanosecond-resolution timestamps. With dates after the year 2300 or so, that leaves only five accurate digits, which isn’t even enough to deal with microseconds correctly. Brilliant.

About these ads

6 responses to “This Week in Python Stupidity: os.stat, os.utime and Sub-Second Timestamps

  1. Till November 16, 2009 at 1:28 pm

    The larger issue seems to be that binary floats in general are broken and/or evil for many situations.

    Python does have a half-assed fixed point type. Not sure why they don’t use it more.

  2. Dave Hughes November 16, 2009 at 11:20 pm

    For this functionality I’m rather surprised they didn’t reach for the datetime module (IEEE floats being, as demonstrated, a terrible solution for handling timestamps). Still, even the datetime type wouldn’t be ideal: it’s precise, but unfortunately it only goes as far as microsecond precision – still short of the required nanoseconds.

    Re: Till’s comment on the fixed point type. Assuming you mean the decimal module, I’m not sure why you describe it as half-assed? My impression (and admittedly I haven’t verified this – I’ve just used it in conjunction with some database work) was that it was an implementation of Mike Cowlishaw’s General Decimal Arithmetic specification which always struck me as pretty good.

    Given the aforementioned limitations of Python’s datetime type, I’d have been more tempted to go with straight-forward integers for this (i.e. have a couple of values holding seconds and nanoseconds – just like in the timespec struct). I’m guessing one could still go this route via ctypes, although that’s pretty ridiculous for something the standard library ought to just Get Right.

  3. Zeth November 17, 2009 at 1:20 am

    Sssh Ciaran, don’t let the cat out of the bag. This is all a trick to give us Python programmers millennium-bug type-work leading up to the year 2300. It is expensive, you know, living in glass tubes: http://en.wikipedia.org/wiki/File:Milleniumcouncle.JPG

  4. Pingback: EAPI 3 to Specify “Auto Space like Word 95″ « Ciaran McCreesh’s Blag

  5. Guy January 10, 2011 at 4:21 pm

    Why not:

    os.stat(“timmy”).st_mtime

    instead of:

    os.stat(“timmy”).st_mtim

    ??

  6. estani December 3, 2012 at 1:14 pm

    Indeed…
    >>> os.path.getmtime(‘/tmp/test.nc’)
    1354539923.0
    >>> os.stat(‘/tmp/test.nc’).st_mtime
    1354539923.0

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.