A clever compiler is indistinguishable from malice

One disadvantage of working for my current employers is that pretty well everything I do is technically commercially sekrit, no matter how dull it might be. But this little fragment isn’t, and its fun. Perhaps it would make a good interview question.

What does the following C fragment compile into…

     {
        extern bool other_thing(void);
        int a;

        if (other_thing())
            a = 1;

        report_thing(a);
     }

…when compiled with gcc, version 4.2, for C89, and certain optimisations that I won’t go into? Specifically, what value does “report_thing” get passed? If you’re thinking “an undefined value, unless other_thing() returns true” then you get some points. Though that would be boring, so its the wrong answer. But assume that other_thing() is external to this compilation unit (so the compiler can’t see it, know anything about its internals, or work out whether its going to return true or false), and assume furthermore (just for the same of simplicity) that it always returns false.

Give up? Try reading this article then have another guess.

OK, here’s the assembler:

                    : ;;         if (other_thing());
                    : ;;             a = 1;
                      bsr        $other_thing

                    : ;;         report_thing(a);
                      ld         AL,#H'0001
                      bsr        $report_thing

Actually that’s not quite the vanilla assembler, but its essentially that (for [[XAP]]). What has happened is that the compiler has put in the call to other_thing(), but ignored the result, and simply called report_thing as though “a” were always 1(the first argument of a function is put in the AL register, in this convention). But why?

The answer of course is nasal demons, or more prosaically optimisation in the face of violation of the language spec. Reading from an uninitialised location in this way is undefined behaviour and allows the compiler to do anything it feels like. A malicious compiler (aren’t they all) would do something odd for reasons of pure evil, but – much as it might seem like in this case – that’s not why its done it. What its done is attempt to apply optimisation (or so I think – I can’t read its mind and don’t know how to get it to tell me what its thinking). Its trying to prune away irrelevant conditionals in order to speed up the code. “Aha”, it thinks, “if other_thing() returns false, then ‘a’ isn’t initialised, and that’s against the spec. Therefore, other_thing() must return true, even though I can’t see it. So, I’ll assume it does.” And since its now deduced, to its own satisfaction, that other_thing() always returns true, it knows that ‘a’ will always be set to 1. Note, BTW, that it still has to put in the call to other_thing(), because it doesn’t know whether it has side effects or not.

[Update: thanks to MW, this is a known bug: Bug 55873 – Missed trivial uninitialized use warning. That leads me to https://gcc.gnu.org/ml/gcc/2004-12/msg00681.html which contains an analysis of what gcc is thinking in a similar case.]

Advertisements

13 thoughts on “A clever compiler is indistinguishable from malice”

  1. Needless to say, such optimizations violate “principle of least astonishment” are especially dumb, given that:

    a) A simple C compiler, like the original and pcc, would just generate straightforward code that would pass the value of a, either 1 or what was on the stack.

    b) Compilers that did serious flow analysis, good enough to optimize code by knowing that some code path could leave a undefined, ought to be good enough to flag such. For C, optimizers that did such flow analysis existed by 1984 (we had one at MIPS, although I don’t recall this case ever coming up, so I don’t know what it would have done.)

    It’s easy for this sort of bug to happen, since programmers sometimes unconsciously think stack variables are initialized to 0.

    C is of course tougher than FORTRAN … and long ago, WATFOR/WATFIV initialized uninitialized values to recognizable patterns and generated code to check.
    Optimizing compilers can do better, warn of some problems a compile time.

    [The compiler is certainly capable of generating such warnings (indeed, our other compiler does). And in other, similar, situations (where it doesn’t do the optimisation) it does. I’m moderately sure that the reason it doesn’t warn in this instance is that by the time its come to the “shall I generate warnings for this code?” step its on the far side of the optimisation step; and of course by then, there is no use of an undefined variable -W]

    Like

  2. 1) It’s a strange flow analysis that knows enough to delete this, but loses the error, but then optimizing compilers have long done odd things.

    2) Just for fun, try adding volatile to the declaration of a.

    Like

  3. Wow, I can see why you gave up your previous mundane life of scientific discovery for the thrills of commercial coding 🙂

    [That’s why this would make a good interview question. If you don’t find this great fun, then you probably shouldn’t come and work for us -W]

    Like

  4. I tell my students the old adage that a C implementation can produce any semantics for an invalid program, and that generating semantics equivalent to this would be standard compliant:

    other_thing() ? report_thing(1) : system(“rm -rf /”);

    But this would also be standards compliant:

    other_thing() ? report_thing(1) : fprintf(stderr, “f.c:27: value of uninitialized variable a usedn”), abort()

    The address sanitizer extension for gcc & clang is nice example of providing more interesting semantics for invalid Cprograms.

    Like

  5. That is why in C, you crank up your warning level, run lint, etc.

    Also use {} religiously, even for one line if statements.

    If you want to get into a functional mindset (ala Scala) then use const / final in more places.

    Like

  6. Compared to old-timers like Fortran, C is a youngster at 40+.
    Successful; languages last a long time and get used far outside their original audience, which in this case were programmers like Ken Thompson, Dennis Ritchie, Brian Kernighan, etc. They tended to write pretty good code, a good thing, since in the early days, all that existed was:
    a) A 20-page memo by Dennis
    b) the source code for UNIX kernel, commands, libraries.

    For some history and maybe some future, try Languages, Levels, Libraries, and Longevity or a copy here, in PDF form.

    For no obvious reason, about half of the 20K downloads of a 10-year-old article happened in the last year.

    Like

  7. @James Annan

    Yes, this confused me – according to the best(sic.) septic blogs, all you have to do is publish a couple of papers saying that ‘Global warming is da troof’, possibly with the odd nice graph, and some guys from the government show up with a wheelbarrow full of cash for you. Is this not the case?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s